[PLUG] Sorting A Text File on Multiple Fields

Rich Shepard rshepard at appl-ecosys.com
Thu Sep 1 20:46:57 UTC 2011


On Thu, 1 Sep 2011, Joe Shisei Niski wrote:

> ...and yet the *ix shell environments are chock full of underrated and
> underutilized tools such as sort, cut, grep, sed, awk, etc., etc. that are
> much quicker to pick up far more appropriate for working with flat files
> than a DBMS.

   For context, I'm running statistical analyses of water chemistry data. For
now, the 148 sampling sites are much too fine a resolution; the Q-Q plots,
x-y plots, box-and-whisker plots are so jammed that only the broadest trends
can be seen.

   So, I'm combining sites by stream and now need to hand-modify the results
so I can average the measured values for multiple sites on the same stream
from the same day and for the same chemical. The sort utility reformatted
the text file so all stream rows are in sequence, then the dates, then the
chemical species. I can't think of a tool to find multiple rows of stream +
date + chemical then average the measured values and delete the redundant
rows. Tedious work.

   But, once I'm done with this I'll use awk to add the two major drainage basins
to the rows for each stream.

   These tools are great, but they cannot do everything (including making
another pot of coffee.)

Rich



More information about the PLUG mailing list