[PLUG] Removing Duplicate Rows from SQL Dump

Denis Heidtmann denis.heidtmann at gmail.com
Tue Aug 16 15:38:14 UTC 2011


On Tue, Aug 16, 2011 at 8:20 AM, Rich Shepard <rshepard at appl-ecosys.com>wrote:

> On Tue, 16 Aug 2011, Roderick A. Anderson wrote:
>
> > Can we see another snapshot of the data? And (did I miss it?) which three
> > columns.
>
> Rod,
>
>   Yep, and yep.
>
> Data:
>
> \N      CVS     1994-01-20      Conductance, Specific   460     uS/cm   t
> \N    \N      \N
> \N      CVS     1994-01-20      Conductance, Specific   522     uS/cm   t
> \N    \N      \N
>
>   (Fred: I think that pg_dump does use tabs as column separators, and there
> are spaces within a column as the above demonstrates. These data were
> extracted from Excel spreadsheets.)
>
>   The three columns are the second, third, and fourth, named loc_name,
> sample_date, and param.
>
>   The current client staff can't figure out either how there could be two
> different values for specific conductance at the same location on the same
> date when both were supposedly checked for quality (the 't' in the seventh
> column).
>
> Rich
>

I have no idea how to actually do it, but how is this as a strategy?
Add a unique column (record #)
Remove col. 2.
Remove duplicate entries in the result.
Note which records have been removed and remove them from the original.
Repeat for cols. 3 and 4.

-Denis



More information about the PLUG mailing list