[PLUG] removing dups

Thu Apr 8 10:02:02 UTC 2004

On Thu, Apr 08, 2004 at 09:50:13AM -0700, Matt Alexander wrote:
> I have a tab-delimited text file that I'm inserting into a database.  The
> first three fields need to be unique.  Sometimes the text file has
> duplication of the first three fields, but the rest is different, so I
> can't simply use "sort -u" to remove the dups.  The first occurance of the
> dupe is always the one that I want removed.
> 
> Any suggestions on a way to find the dups and remove the first occurance?

Build a perl script that uses a three-level hash where the key is a string
that is all other fields.  Each line will be add or modify an entry in the hash.

At the end, you'll have a hash with all unique keys where the values are
the last-most seen.

If the order of the file is important, use something like a supplementary
array to keep the original order or use a tied array like Tie::IxHash

Colin