[PLUG] pdf, postscript and text

Rich Shepard rshepard at appl-ecosys.com
Mon Sep 29 15:16:02 UTC 2003


On Mon, 29 Sep 2003, Aaron Burt wrote:

> Totally different PS generators, thus different PS code.  Differences in
> graphics rendering and compression become significant here.

  Aha! That makes sense; didn't think of it.

> From the sound of it (beeeg PS files, no text) these are PDFs containing
> bitmaps of printed or scanned output.  In other words, you don't have
> text, just pictures of text.  Does the text select tool in acroread work?

  No! And that threw me for a loop. I thought that I could cut and paste if
I couldn't disassemble.

> Sounds like you're stuck with extracting bitmaps (GS can print to TIFF and
> other formats) and feeding 'em through an OCR program.

  Oy, vey! Anyone have gocr or jocr running well? What I'll need to do is
convert _all_ these multi-hundred page documents into something I can use.
Groan!

Thanks, Aaron,

Rich

Dr. Richard B. Shepard, President

                       Applied Ecosystem Services, Inc. (TM)
            2404 SW 22nd Street | Troutdale, OR 97060-1247 | U.S.A.
 + 1 503-667-4517 (voice) | + 1 503-667-8863 (fax) | rshepard@appl-ecosys.com
                         http://www.appl-ecosys.com/




More information about the PLUG mailing list