[PLUG] OT: Databases

Steve Bonds 1s7k8uhcd001 at sneakemail.com
Thu Sep 19 08:07:57 UTC 2002


On Wed, 18 Sep 2002, Jeremy Bowen ocarules at webnationinc.com wrote:

> This is probably more of a datatbase question than a Linux question.
> 
> I need someone who can show me how to do the following (consultant or just
> someone good with databases and programs):
> 
> List A (45,000 records) has: Name, Address, City, State, Zip
> 
> List B (100,000) has: Name, Address, City, State, Zip, Additional Data,
> Additional Data.......
> 
> I need to compare the two lists and MARK the records in List A that exist in
> list B (They need to be marked in List A). Using Linux would be great but I
> don't have a functioning Linux box right now and I need the data on the
> Windows box because the software the will ultimately use the list has not
> Linux alternative.
> 
> Has anyone done anything like this before? I would be more than happy to pay
> someone to teach me how to accomplish this task.

This would be a trivial perl script.  The only hard part would be for you
to define how closely the records need to match.  For example:
  + Exact name match, but address differs
  + Exact name match, address match, ZIP differs
  + is name match case-sensitive or not?
  + Name doesn't match, but address does
  + Other combinations

Suggested program structure: (optimized for speed and simplicity at the
expense of using more memory than it might otherwise.  With as few
records as you have, this shouldn't be a problem on any modern PC.)

  + pull list A into a hash, index by the item(s) you want to match
exactly
  + pull list B into a hash, index it the same way
  + for each hash index in B, pull the matching record from A

  -- Steve






More information about the PLUG mailing list