[PLUG] Finding partial duplicate rows with uniq
Eric Wilhelm
enobacon at gmail.com
Wed Oct 31 01:06:48 UTC 2012
# from Rich Shepard on Tuesday 30 October 2012:
> I have a large data file that contains duplicate rows. 'uniq' finds
>those rows that match character-by-character, but not those who match
>only on the first three fields (separated by '|').
Hi Rich,
perl -e 'while(<>) {
my $k = join "|", (split /\|/, $_, 4)[0..2];
print unless $seen{$k}++
}'
(untested) That should give you the first instance for each $k, where
$k is the first three fields.
--Eric
--
---------------------------------------------------
http://scratchcomputing.com
---------------------------------------------------
More information about the PLUG
mailing list