[PLUG] gawk: modify field contents

Rich Shepard rshepard at appl-ecosys.com
Wed Jul 8 17:33:11 UTC 2015


On Wed, 8 Jul 2015, Pete Lancashire wrote:

> Floating point numbers without a leading '<' (e.g., ',0.01,,') are written
> to the output file and, if there is a blank field immediately following,
> insert a zero (0) in that following field.

   True.

> What are the FP fields ?

   Chemical concentrations, generally in mg/L.

> If a FP field and the following field is empty for example 123.4,<blank>
> change this to 123.4,0

> Sample 10321000__1981-09-17
>
> The field NH4 is 0.13 and NO2 is empty. This should be translated to 0.13,0

   No. I need to more carefully define criteria.

> The hardest part it knowing which are the FP fields. If you restrict
> yourself to using
> regex's it could be done but you would end up with something like
>
>   <regex>{3}; <regex>{6}, <regex>{22}

   I was thinking of /[[:digit:]]+\.[[:digit:]]+/ because there should always
be at least one digit to the left of the decimal point and at least one
digit to the right of the decimal point.

> If I was doing this and had a list of the fields I'd do the RTL process
> either in Perl (I've not used python) where one would have an array of
> which fields are FP something like (0,0,0,1,0,0,1,1,0,....) where 1 is FP
> and then read a line split into an array, loop through each fields if the
> index of the 'if fp' array ='s 1 then with a switch/case (makes it easy to
> add more logic) do what you want.

   You make an excellent point. There's so much variability in these data
sets -- including large chunks of missing data -- that tools like sed and
awk are stretched when trying to define complex patterns. OK. I'll go back
to modifying a Python script I used for a couple of simpler cases. Sigh.

Thanks very much, everyone,

Rich




More information about the PLUG mailing list