[PLUG] Re-formatting Date in .csv Field

Sam Hart criswellious at gmail.com
Wed Jan 26 00:15:22 UTC 2011


On Tue, Jan 25, 2011 at 3:50 PM, Rich Shepard <rshepard at appl-ecosys.com> wrote:
> On Tue, 25 Jan 2011, Sam Hart wrote:
>
>> awk '{split ($1, a, ","); split (a[4], b, "/"); printf
>> "%s,%s,%s,%s-%s-%s,%s,%s,%s,%s\n", a[1], a[2], a[3], b[3], b[1], b[2],
>> a[5], a[6], a[7], a[8]; }'
>
>> Give it a try, refine it (because it's fugly), your mileage may vary,
>> yadda yadda...
>
> Sam,
>
>   I'm stuck on the 'yadda yadda' part. I've looked in my copy of 'sed & awk'
> by Dougherty and Robbins without seeing the answer to two questions that
> have come up.
>
>   Here's another line where one field is a string with an embedded comma.
> Awk takes that as the field separator ignoring all double quotes in the
> input line:
>
> NULL,"96-A001787","BC-0.5",6/25/1996,"Alkalinity, Bicarbonate",212,"mg/L CaCO3",NULL
>
> The script output therefore is:
>
> NULL,"96-A001787","BC-0.5",1996-6-25,"Alkalinity,,,

Yep, that's because my original assumption was that your csv stood for
"Comma Separated Values" in the absolute most literal sense...

It's looking like I'd probably wind up using python for this after all
(as Tim suggested), as python's csv module is brainy enough to handle
stuff like this, for example:

>>> import csv
>>> crud = csv.reader(open('data', 'rb'), delimiter=',')
>>> for row in crud:
...    print row
...
['27132', '96-A001256', 'BC-0.5', '5/21/1996', 'pH', '8.19', '', 'True']
['NULL', '96-A001787', 'BC-0.5', '6/25/1996', 'Alkalinity,
Bicarbonate', '212', 'mg/L CaCO3', 'NULL']

Seems to have handled the quirky line just fine.

                                      ---Sam



More information about the PLUG mailing list