[PLUG] Copying text from pdf
Rich Shepard
rshepard at appl-ecosys.com
Fri Oct 17 14:48:01 UTC 2003
On Fri, 17 Oct 2003, Bill Spears wrote:
> Is there any way to copy a portion of the text of a pdf file?
Yes. Two ways; three, actually.
1. If the pdf was generated by a linux tool or Adobe's distiller then you
can simply block the text in xpdf or acroread and paste it into either a
virtual terminal or a gen-u-whine X application.
2. Same source as above: run the pdf file through 'ps2ascii <input.pdf>
<output.txt>' then clean up the closed-up words with a text editor.
3. If the file was scanned, run it through 'pdftopbm' to translate to a
bit-mapped format that gocr can recognize. Then read the man page for gocr
and generate one .txt file for each page in the original pdf document. Takes
some cleaning -- depending on the quality of the scan and the fonts used --
but it is faster than rekeying everything.
Rich
Dr. Richard B. Shepard, President
Applied Ecosystem Services, Inc. (TM)
2404 SW 22nd Street | Troutdale, OR 97060-1247 | U.S.A.
+ 1 503-667-4517 (voice) | + 1 503-667-8863 (fax) | rshepard@appl-ecosys.com
http://www.appl-ecosys.com/
More information about the PLUG
mailing list