[PLUG] PDF-1.5 docs not searchable

Sun Jul 25 14:09:53 UTC 2021

Rich, I did verify last night and the free version of master pdf editor
does include OCR.i installed via the AUR.  It works very well against the
text, but does try converting the pictures to text as well with some
humorous results.

Jason

- Sent from my pocket computing telecommunications device.  All typos and
poor communications will be blamed on the autocarrot function of said
device.

On Sun, Jul 25, 2021, 5:53 AM Rich Shepard <rshepard at appl-ecosys.com> wrote:

> On Sun, 25 Jul 2021, Jason Barnett wrote:
>
> > I believe you mentioned Master PDF editor. I believe it has OCR built-in,
> > or allows it as a plugin. If needed, a good OCR tool is Tesseract and is
> > likely in your distro's repository.
> > https://en.wikipedia.org/wiki/Tesseract_(software)
>
> Jason,
>
> Thank you. The most recent doc I viewed (that prompted my post) has
> multiple
> images per page; it's not all text. While I don't remember the few others
> that I could not search it's likely that they, too, had many images
> embedded
> within the text. So I assume they were all scanned (or produced by an
> equivalent process).
>
> I used to get scanned documents (such as permit copies) from clients and
> had
> no reason to run them through an OCR, but I'll keep that in mind for the
> future.
>
> Germane to MasterPDFEditor, I expect that its OCR capabilites are in the
> paid version, not the free one. And, yes, Tesseract is in the SBo repo.
>
> Stay well,
>
> Rich
>