[PLUG] Removing Duplicate Rows from SQL Dump

Rich Shepard rshepard at appl-ecosys.com
Tue Aug 16 12:39:54 UTC 2011


On Tue, 16 Aug 2011, Rich Shepard wrote:

>   This will work for all completely duplicated lines. I'll need to see how
> many remain that vary in one or more columns ('fields') such as the
> parameter, lab_id number, or qa_qc.

   I had manually cleaned up a bunch of lines so the souce file had 12,119
lines. AFter running uniq the output file has 8,605 lines, about 1/3 fewer.

   The need to remove almost duplicates/triplicates, based on the same values
in three columns regardless of the rest of the contents, remains.

Rich



More information about the PLUG mailing list