[PLUG] Copying text from pdf

Rich Shepard rshepard at appl-ecosys.com
Fri Oct 17 14:48:01 UTC 2003


On Fri, 17 Oct 2003, Bill Spears wrote:

> Is there any way to copy a portion of the text of a pdf file?

  Yes. Two ways; three, actually.

1. If the pdf was generated by a linux tool or Adobe's distiller then you
can simply block the text in xpdf or acroread and paste it into either a
virtual terminal or a gen-u-whine X application.

2. Same source as above: run the pdf file through 'ps2ascii <input.pdf>
<output.txt>' then clean up the closed-up words with a text editor.

3. If the file was scanned, run it through 'pdftopbm' to translate to a
bit-mapped format that gocr can recognize. Then read the man page for gocr
and generate one .txt file for each page in the original pdf document. Takes
some cleaning -- depending on the quality of the scan and the fonts used --
but it is faster than rekeying everything.

Rich

Dr. Richard B. Shepard, President

                       Applied Ecosystem Services, Inc. (TM)
            2404 SW 22nd Street | Troutdale, OR 97060-1247 | U.S.A.
 + 1 503-667-4517 (voice) | + 1 503-667-8863 (fax) | rshepard@appl-ecosys.com
                         http://www.appl-ecosys.com/




More information about the PLUG mailing list