[PLUG] 'sort' to find duplicate rows

John Sechrest sechrest at gmail.com
Fri Jul 13 23:23:09 UTC 2012


If you are using a straight text comparision, the -u option to sort gives
you a unique list (no duplicates)

However, if they are semantically identical, but syntactically different,
this will not work.


On Fri, Jul 13, 2012 at 3:59 PM, Rich Shepard <rshepard at appl-ecosys.com>wrote:

>    The text file has > 120k rows; each row has 8 columns. There are
> duplicate rows that I want to eliminate. My reading of the sort man page
> and
> various Web pages with examples tells me that the sort --key option is
> limited to a sequential starting field and ending field. What I need is to
> sort on fields 1, 2, and 4.
>
>    If 'sort' won't do this, what tool will? I don't see how awk, sed, or
> grep
> can, yet a combination of these perhaps might.
>
> Rich
>
> _______________________________________________
> PLUG mailing list
> PLUG at lists.pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug
>



-- 
John Sechrest          .
                                       .
                                             .
                                                     .

                                                               .
           sechrest at gmail.com
                                                                          .
                             @sechrest  <http://www.twitter.com/sechrest>

             .
        http://www.oomaat.com
               .



More information about the PLUG mailing list