[PLUG] Removing Duplicate Rows from SQL Dump

Rich Shepard rshepard at appl-ecosys.com
Tue Aug 16 15:23:55 UTC 2011


On Tue, 16 Aug 2011, Hal Pomeranz wrote:

> I'm not certain from your email exactly which columns you want to
> de-duplicate on, but the solution is to use sort:
>
> 	sort -u -k1,4 inputfile >inputfile.de-duped
>
> The "-k" option should specify the range of columns you want to use
> for de-duplication.  The "-u" tells sort to only output the lines that
> are unique on those columns.  Of course your output will also end up
> being re-sorted on those columns, so you may have to re-order it again
> after you're finished de-duping.

Hal,

   I've known of and used sort before but never fully read the man page to
see that it could be used this way.

   Row order doesn't matter because postgres does not use row order for
anything (although too may folks want to use the RID internal numer as a
primary key) and SQL does not return rows in any particular sequence without
the ORDER BY option.

Thanks,

Rich



More information about the PLUG mailing list