[PLUG] removing dups
Colin Kuskie
ckuskie at dalsemi.com
Thu Apr 8 13:11:02 UTC 2004
On Thu, Apr 08, 2004 at 01:49:44PM -0500, Zot O'Connor wrote:
> A really bad way of doing it:
>
> #Assuming ',' is field seperator
> FS=, cat filetoamkerandallmad.csv | awk 'sub("\n"); {print "$0|$i $1$2$3\n"; $i++}' | sort -k 1 | uniq -F 1 | sort -t '|' -k 1| awk '{sub("|.*$"); print $0}'
>
>
> This assumes the field separator is ',' and is not escaped. that there
> are no spaces pipes in the fields, That the sub command effects $0
> (otherwise you need to use a variable there).
It also assumes that concatenating $1$2$3 produces a unique key,
which might be true depending on the data. It would be better to
break each line into 4 fields, $1, $2, $3, $4(the rest of the line)
and create a key separated by some unlikely character.
Colin
More information about the PLUG
mailing list