[PLUG] removing dups

Colin Kuskie ckuskie at dalsemi.com
Thu Apr 8 13:11:02 UTC 2004


On Thu, Apr 08, 2004 at 01:49:44PM -0500, Zot O'Connor wrote:
> A really bad way of doing it:
> 
> #Assuming ',' is field seperator
> FS=, cat filetoamkerandallmad.csv | awk 'sub("\n"); {print "$0|$i $1$2$3\n"; $i++}' | sort -k 1 | uniq -F 1 | sort -t '|' -k 1| awk '{sub("|.*$");  print $0}'
> 
> 
> This assumes the field separator is ',' and is not escaped.  that there
> are no spaces pipes in the fields, That the sub command effects $0
> (otherwise you need to use a variable there).

It also assumes that concatenating $1$2$3 produces a unique key,
which might be true depending on the data.  It would be better to
break each line into 4 fields, $1, $2, $3, $4(the rest of the line)
and create a key separated by some unlikely character.

Colin




More information about the PLUG mailing list