[PLUG] gawk: modify field contents

Rich Shepard rshepard at appl-ecosys.com
Tue Jul 7 23:41:51 UTC 2015


On Tue, 7 Jul 2015, Larry Brigman wrote:

> Messed that one up.... Sample data needed/wanted to verify proper operation.

Larry, et al.:

   Here's what I want to accomplish. Don't care if the tool is written in
awk, sed, python, or a bash shell script; I want to save days and weeks of
tedious data cleaning exported from sloppy Excel spreadsheets.

   The following 20 lines of data are for testing while developing a data
cleaning program. The first line (the header) is written, un-molested, to
the output file.

   All strings and integer fields are written to the output file.

   Floating point numbers without a leading '<' (e.g., ',0.01,,') are written
to the output file and, if there is a blank field immediately following,
insert a zero (0) in that following field.

   Floating point numbers with a leading less-than symbol and space, '< ',
have that stripped off leaving the number in the field. The following blank
field has a one (1) inserted.

   To make the program generic, floating point numbers with less-than symbols
but no following space, and thosse with a leading minus sign (-) also have
those symbols stripped off and a 1 inserted in the following field.

   All fields with 'M' and a variable amount of white space have the noise
removed so it's a blank field written to the output file.

----------------------------------------------------
site.code,sampdat,tmp.h2o,tmp.air,disc.cfs,Turb,Cond,DO,pH,disc.cms,Alk,hard,TDS.mgl,TDS.tpd,TSS,ANC,HCO3,CO3,N.tot,N.org,NH4,NO2,NO3,o-PO3,o-P,org.C,CN,ceneq1,Ca,Mg,Na,K,Cl,SO4,F,ceneq1,Si,As,Ba,ceneq1,B,ceneq1,Ca,ceneq1,Cr,ceneq1,Co,ceneq1,Cu,ceneq1,F,ceneq1,Pb,ceneq1,Mn,ceneq1,Mo,ceneq1,Ni,ceneq1,Ag,ceneq1,Sr,ceneq1,V,ceneq1,Zn,ceneq1,Al,ceneq1,Li,ceneq1,Se,ceneq1
10321000,1965-10-01,,,98,,490,,8.4,2.80,,150,,85.60,,,238,4,,,,,,,,,,,40,12,46,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
10321000,1965-11-01,,,141,,507,,8.3,4.00,,170,,122.00,,213,252,4,,,,,,,,,,,50,11,42,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
10321000,1965-12-01,,,128,,564,,8.1,3.60,,190,359,129.00,,238,290,0,,,,,,,,,,,56,13,47,6.4,16,37,0.5,0,40,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
10321000,1979-07-30,26.00,,80,10.0,458,7.50,8.8,2.30,,160,299,61.30,83,235,280,3,0.66,0.62,0.01,,,,,,,,44,11,35,7.1,11,24,0.4,0,25,8.00,90.00,0,,,< 2,,,,< 3,,M ,,< 10,,U ,,< 1,,,,,,U ,,,,,,< 3,,,,,,< 1,
10321000,1979-08-29,21.50,,29,,518,,,0.82,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
10321000,1979-08-29,21.50,,19,13.0,509,8.50,8.4,0.54,,160,201,11.60,78,185,220,3,0.69,0.42,0.01,,,,,5.4,,,46,11,48,9.6,21,39,0.6,0,26,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
10321000,1979-10-26,12.50,,42,6.8,610,11.00,8.9,1.20,,190,395,42.50,,259,230,42,0.81,0.49,0.09,,,,,,,,54,14,55,11,27,48,0.6,0,30,10.00,100.00,0,,,M ,,0.00,,< 3,,M ,,< 10,,0,,M ,,,,M ,,0,,,,,,< 3,,,,,,0,
.9,13,29,0.2,0,29,7.00,90.00,0,,,< 1,,0.00,,< 3,,10,,20,,0,,10,,,,0,,0,,,,,,M ,,,,,,0,
10321000,1980-06-23,20.50,,2070,23.0,362,6.70,,59.00,,140,,1370.00,,,,,,,0.04,,,,,8.9,,,42,9.2,24,5.2,7.9,11,0.3,0,32,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
10321000,1980-07-29,24.00,,203,11.0,427,8.10,8.2,5.70,,160,257,147.00,,180,230,0,0.63,0.5,0.01,,,,,,,,46,9.9,31,5.9,10,20,0.5,0,25,7.00,90.00,0,,,M ,,0.00,,< 3,,M ,,40,,M ,,20,,,,0,,0,,,,,,10,,,,,,0,
10321000,1980-08-25,19.50,,49,11.0,545,9.20,7.7,1.40,,170,342,45.60,,216,260,0,1.4,1.1,0,,,,,10,,,50,12,47,8.2,20,40,0.8,0,26,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
10321000,1980-09-30,17.00,,47,6.0,530,11.00,8.4,1.30,,170,336,40.00,,223,260,6,0.82,0.74,0,,,,,4.4,,,47,12,50,8.8,23,36,0.6,0,25,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
10321000,1980-11-21,5.00,,108,9.5,518,12.20,8.5,3.10,,180,346,99.10,,190,220,8,0.6,0.46,0.02,,,,,,,,52,12,45,8.5,21,52,0.5,0,28,7.00,100.00,0,,,< 1,,0.00,,< 3,,M ,,10,,0,,10,,,,0,,0,,,,,,< 3,,,,,,0,
10321000,1981-01-28,3.50,,127,14.0,510,12.00,8.7,3.60,,170,355,113.00,,240,270,19,,,0.05,,,,,,,,50,12,46,6.5,19,38,0.5,0,30,8.00,100.00,0,,,< 1,,0.00,,< 3,,0,,30,,M ,,20,,,,0,,0,,,,,,10,,,,,,0,
10321000,1981-03-25,9.50,,110,24.0,589,11.80,8.8,3.10,,200,364,110.00,,240,250,17,0.55,0.51,0.05,,,,,4.9,,,56,14,53,6.8,20,47,0.5,0,28,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
10321000,1981-05-20,13.00,,117,16.0,475,,8.6,3.30,,180,,91.60,,200,230,7,1,0.94,0.08,,,,,,,,55,10,36,7.7,14,,0.5,0,27,9.00,,,,,< 1,,0.00,,< 3,,M ,,20,,M ,,20,,,,M ,,0,,,,,,,,7,,,,0,
10321000,1981-07-21,25.50,,6,8.8,602,9.20,8.6,0.16,,200,382,5.77,,240,270,12,1,0.71,0.13,,,,,,,,54,15,55,10,25,43,0.5,0,34,14.00,120.00,0,,,< 1,,,,< 3,,M ,,10,,M ,,10,,,,M ,,0,,,,,,M ,,,,,,0,
10321000,1981-09-17,21.50,,10,3.0,487,10.20,8.1,0.27,,170,292,7.39,,200,250,0,1.2,0.97,0.13,,,,,,,,44,14,37,8,15,25,0.6,0,24,9.00,100.00,0,,,< 1,,0.00,,< 3,,M ,,10,,M ,,20,,,,M ,,0,,,,,,10,,,,,,1,
10321000,1981-12-04,3.00,,23,3.5,630,13.00,8.8,0.65,,210,406,24.30,,240,260,15,0.57,0.38,0.09,0.02,0.1,,,,,,62,13,58,9.5,33,56,0.6,0,30,9.00,100.00,0,,,< 1,,< 10,,< 3,,< 1,,30,,< 1,,20,,,,M ,,< 1,,,,,,< 3,,,,,,< 1,
----------------------------------------------------
TIA,

Rich



More information about the PLUG mailing list