[PLUG] removing dups
Colin Kuskie
ckuskie at dalsemi.com
Thu Apr 8 10:02:02 UTC 2004
On Thu, Apr 08, 2004 at 09:50:13AM -0700, Matt Alexander wrote:
> I have a tab-delimited text file that I'm inserting into a database. The
> first three fields need to be unique. Sometimes the text file has
> duplication of the first three fields, but the rest is different, so I
> can't simply use "sort -u" to remove the dups. The first occurance of the
> dupe is always the one that I want removed.
>
> Any suggestions on a way to find the dups and remove the first occurance?
Build a perl script that uses a three-level hash where the key is a string
that is all other fields. Each line will be add or modify an entry in the hash.
At the end, you'll have a hash with all unique keys where the values are
the last-most seen.
If the order of the file is important, use something like a supplementary
array to keep the original order or use a tied array like Tie::IxHash
Colin
More information about the PLUG
mailing list