[PLUG] Finding partial duplicate rows with uniq

Eric Wilhelm enobacon at gmail.com
Wed Oct 31 01:06:48 UTC 2012


# from Rich Shepard on Tuesday 30 October 2012:
>   I have a large data file that contains duplicate rows. 'uniq' finds
>those rows that match character-by-character, but not those who match
>only on the first three fields (separated by '|').

Hi Rich,

  perl -e 'while(<>) {
    my $k = join "|", (split /\|/, $_, 4)[0..2];
    print unless $seen{$k}++
  }'

(untested)  That should give you the first instance for each $k, where 
$k is the first three fields. 

--Eric
-- 
---------------------------------------------------
    http://scratchcomputing.com
---------------------------------------------------



More information about the PLUG mailing list