[PLUG] gawk: modify field contents
Rich Shepard
rshepard at appl-ecosys.com
Wed Jul 8 17:33:11 UTC 2015
On Wed, 8 Jul 2015, Pete Lancashire wrote:
> Floating point numbers without a leading '<' (e.g., ',0.01,,') are written
> to the output file and, if there is a blank field immediately following,
> insert a zero (0) in that following field.
True.
> What are the FP fields ?
Chemical concentrations, generally in mg/L.
> If a FP field and the following field is empty for example 123.4,<blank>
> change this to 123.4,0
> Sample 10321000__1981-09-17
>
> The field NH4 is 0.13 and NO2 is empty. This should be translated to 0.13,0
No. I need to more carefully define criteria.
> The hardest part it knowing which are the FP fields. If you restrict
> yourself to using
> regex's it could be done but you would end up with something like
>
> <regex>{3}; <regex>{6}, <regex>{22}
I was thinking of /[[:digit:]]+\.[[:digit:]]+/ because there should always
be at least one digit to the left of the decimal point and at least one
digit to the right of the decimal point.
> If I was doing this and had a list of the fields I'd do the RTL process
> either in Perl (I've not used python) where one would have an array of
> which fields are FP something like (0,0,0,1,0,0,1,1,0,....) where 1 is FP
> and then read a line split into an array, loop through each fields if the
> index of the 'if fp' array ='s 1 then with a switch/case (makes it easy to
> add more logic) do what you want.
You make an excellent point. There's so much variability in these data
sets -- including large chunks of missing data -- that tools like sed and
awk are stretched when trying to define complex patterns. OK. I'll go back
to modifying a Python script I used for a couple of simpler cases. Sigh.
Thanks very much, everyone,
Rich
More information about the PLUG
mailing list