[PLUG-TALK] High speed document scanners (2)

Gregg Berkholtz gregg at berkholtz.net
Thu Sep 1 20:11:15 PDT 2011

On Sep 1, 2011, at 1:17 PM, Keith Lofstrom wrote:

> On Thu, Sep 01, 2011 at 10:22:57AM -0700, Gregg Berkholtz wrote:
>> As for Linux...the direct-connected units work great with Windows
>> and MacOS systems, the more expensive network-based scanners can
>> push PDFs via email, SMB or FTP...so Linux would fit nicely there.
> I was unclear - we do not have a windows machine, for security
> reasons we avoid having them around.  One review on Amazon
> suggests the N1800 sends a .NET applet to a windows machine as
> part of the setup procedure (not a true web interface), which
> means it cannot be configured from Linux.   So, the question is
> actually, will these machines work in a windows-free environment?
I totally understand the purist desire, although since I've found that being totally windows-free cuts out way too many otherwise superior product options, and therefore opportunities to actually help people; I keep a WinXP VM around for silly things like this... I can tell you that setup was painless, but that could have meant I fired up the VM to complete setup...it's been a while since the last N1800 setup, so I honestly don't remember - I'd have to dig through their documentation & ticket notes, and I'm just not in the mood tonight.

Although, since I'm that client's primary contact, I guess this reflects how often the N1800 needs babysitting :-)

>>> Assuming a 100Mbps link and server, and that all manufacturers
>>> exaggerate, how fast does the N1800 actually scan text pages
>>> at B&W 600dpi, single sided?  Duplex? 
>> With local caching on the device...well, let's just say I've yet
>> to see anyone move papers fast enough.  Honestly, I haven't tested
>> this possibility. Although everyone's setup for duplex scanning;
>> there's two physical scanners in these things, so it's a
>> single-pass duplex scan.
> That's understood.  However, if we dump a batch of 40 pages into 
> the input hopper, and set it to scanning at high resolution, it
> will take some time to process and move the bits, especially if
> it is USB2 rather than 100MB ethernet.  The OCE slows noticably
> if it is pushing that many bits, and would slow more if we ran
> it duplex.  Which is the next experiment.  The scanning speed
> question is about workflow and labor hours.  We will be processing
> the equivalent of 50 file boxes of papers.   Mostly single sided
> paper.  
Real-time OCR, yea it'll be a major drag. You could always do your ORC in the middle of the night; the only decent OCR I've ever seen is software-based anyway.

As for scanning speeds...manufacturer fluff or not, I haven't actually witnessed people consistently keeping up with the machines.

> So, how fast do these puppies move paper with lots of pixels?
Umm...fast? Seriously, between yanking staples, stacking papers, and just keeping things organized within a decent and mature workflow, I've yet to see anyone keep up with the higher-end units.

Maybe if we re-frame your question to:
 How large is each scanned page?

Ah, I can answer that one:
 This full-color eight page document (two of which are business cards) was scanned on my ages-old unit a few weeks back:

This is a B&W only 14-page scan with a mixture of text and graphics:

So, totally roughing things out, and we come up with ~242KB/page. With a "perfect" 40PPS feedrate, you're looking at a maximum bandwidth need of ~77Mbps (not counting protocol overheads, etc...). Either way, 100Mbps should be sufficient for shuffling around the generated PDFs.

What's the Real World workflow? Averaged over one hour blocks, and after watching the average office worker use equipment, my gut says a mature workflow could best-case shuffle ~1,500 pages/hour. Or, maybe one box every ~90 minutes. Which, when you think about it, pretty much explains why bulk scanning services can get away with anywhere from $0.10-$0.25/page...

I sure hope you have a DMS in mind too; there's all kinds of ACL, auditing/tracking, and general management fun to be had here.

