[PLUG] Data exchange/transfer application?

Wil Cooley wcooley at nakedape.cc
Thu Aug 3 18:24:56 UTC 2006


On Thu, 2006-08-03 at 10:00 -0700, Aaron Burt wrote:
> On Thu, Aug 03, 2006 at 09:17:57AM -0700, Wil Cooley wrote:
> > On Wed, 2006-08-02 at 17:08 -0700, Aaron Burt wrote:
> > > Exactly.  I still don't understand what about your problem wouldn't be
> > > trivial to handle with common shell tools, and a simple tabular file of
> > > file-transfer specs.
> > 
> > It would be like implementing an MTA with netcat and shell.
> 
> Yes, you've given that impression.  File transfers are easy.  Scheduling
> is easy.  What is it about your problem that makes it *not* easy?
> 
> I'm totally missing something here, and I'd rather find out here than by
> making a fool of myself at work.

Individual file transfers are easy.  Gluing it all together into a
cohesive system handling the myriad combinations of transfer protocols,
login information, encryption applications, notifications, auditing,
gracefully handling the myriad of ways each component can fail -- and
doing it well -- is hard.  But it's not hard in any interesting
sense--not like, say, the Red Army / Blue Army problem or solving a
Rubik's cube is hard.  It's hard in the sense that doing it right would
require 20-80 man-hours that I necessarily have to spare.

What I've got now is several dozen scripts that do all of this stuff,
each of which is "easy" on its own, but all piled on top of one another,
is a huge fucking mess.  Almost all of these are descended from three or
four proto-scripts and copied and slightly tweaked to do something
different.  I *can* try to refactor them, eliminate the duplication,
etc., but these scripts are in production and tinkering around with them
is not a good idea.  Many of them are ksh, which is harder to unit-test
than Perl/Python/Ruby.  Network transactions are also a lot harder to
design unit-tests for.

Let's consider the implementation details for a single FTP PUT.  You've
got to:

  o Check that the local file exists and produce an appropriate message 
    if it doesn't.
  o Check if the file is zero-length.  For legacy reasons, some files 
    will always exist, but not really be the desired file until it 
    actually has data.  In other cases, you have to upload an empty 
    file even if there is no file, or copy a CSV template with just 
    headers (a CSV file with no records, in other words).
  o Figure out the name the resulting file should have.
  o Figure out the mode, binary or text?
  o Connect and login to the FTP server, which can result in:
    o Success
    o Login failed
    o Too many users
      o Sleep, retry until ...
    o Timeout
      o Sleep, retry until ...
  o Actually upload the file, which can result in:
    o Success
    o Permission denied error
    o Out-of-space error
      o Sleep, retry until ... ?
    o Server too busy 
      o Sleep, retry until ...
    o Timeout
      o Sleep, retry until ...
    o Wrong transfer mode (active/passive)
      o Reconfigure and try again
  o Configured to verify upload?
    o No
    o Yes
      o Check file existence and size (that's about all you get w/FTP)
        o File is there and correct size
        o File isn't there
        o File size is wrong
          o How do you handle file size changes in text mode?
        o Listing is denied
        o Timeout
  o Archive local file afterwards?
    o No
    o Yes
    o ... (All the stuff that can go wrong with just mv'ing a file)
  o Delete local file afterwards?
    o No
    o Yes
    o ... (All the stuff that can go wrong with rm'ing a file)
  o Produce an execution report
  o Produce an error report, if any
  o Produce a trace report, if enabled
  o Route reports to correct people

It took me twenty minutes or so just to write this summary; it'd take me
about two to four hours to code it, more to test, then integrate into
larger framework.  And then there's designing the framework,
implementing FTP GET, SFTP GET, SFTP PUT, ...

And why do you have to be anal about handling every error as much as
possible, by retrying, etc., instead of just aborting and throwing up a
message?  Because you have hundreds of such jobs and putting out that
many fires gets tiresome really fast.

And then there are other things that are maybe outside the core module,
like looking at audit logs and producing reports, or verifying that a
particular job ran within the allowed time periods (it's not unusual
that there will be files that come or are sent certain times of the
month or quarter or year, that could arrive within some particular delta
of time, but need to have arrived within it).

You see now why _managing_ lots of transfers for _unattended_ operation
is non-trivial?

I don't know how many systems you've managed at one time, but you can
start out crafting each installation for the particular need, being
absolutely parsimonious with disk space used, using scp in "for" loops
in the shell to keep critical files in sync (maybe you even write a
wrapper script or shell function), but at some point you realize that
you could spend more time reading Slashdot and ranting on the PLUG list
if you invested some time up front in automating the OS installation and
learning and using a tool like cfengine to everything in sync.  This is
analogous to that.

Wil
-- 
Wil Cooley <wcooley at nakedape.cc>
Naked Ape Consulting, Ltd
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.pdxlinux.org/pipermail/plug/attachments/20060803/71c7ac90/attachment.asc>


More information about the PLUG mailing list