[PLUG] Correcting duplicate strings in files

Carl Karsten carl at personnelware.com
Wed Jun 20 00:21:43 UTC 2018


It could be done with transistors if you spend enough time ;)

I would add some code that verifies assumptions, like
are the dates always the same
is it just the 1700 are 1600?

anyway, assuming all our descriptions and assumptions are correct,
and the file starts at 2012-10-01,14:00


import csv
from datetime import datetime, timedelta

year = 2012
file_name = 'observation_{}.csv'.format(year)

start_time = datetime(year,10,1,14)

for h,input_line in enumerate(csv.reader(open(file_name))):
    timestamp = start_time + timedelta(hours=h)
    data_line = "{},{}".format(
            timestamp.strftime("%Y-%m-%d,%H:%M"), input_line[2] )
    print(data_line)

carl at twist:~/temp$ python allhours.py
2012-10-01,14:00,90.7999
2012-10-01,15:00,90.8121
2012-10-01,16:00,90.8121
2012-10-01,17:00,90.8121
2012-10-01,18:00,90.8091
2012-10-01,19:00,90.8030

On Tue, Jun 19, 2018 at 6:24 PM, Rich Shepard <rshepard at appl-ecosys.com> wrote:
> On Tue, 19 Jun 2018, Carl Karsten wrote:
>
>> Python will be the easiest to understand.
>> is it always 16:00, or is it any time the whole line is duplicated,
>> bump the 2nds hour?
>
>
> Carl,
>
>   The values may differ by hour. It's only the second 16:00 hour each day
> that
> is incorrect.
>
>> also, if you have one line for every hour of the year, how about
>> looping over all those datetimes, pared up with your data, and replace
>> all the datetimes (both good and flawed) with the calculated datetime.
>
>
>   I have everything correct but for the duplicated 4pms.
>
>> Here is 1/2 of it:
>>
>> from datetime import datetime, timedelta
>>
>> for h in range(8760):
>>    timestamp = datetime(2012,1,1) + timedelta(hours=h)
>>    data_line = "{},{}".format(
>>            timestamp.strftime("%Y-%m-%d,%H:%M"),
>>            "123.456")
>>    print(data_line)
>
>
>   Here's my test file (test.dat):
>
> 2012-10-01,14:00,90.7999
> 2012-10-01,15:00,90.8121
> 2012-10-01,16:00,90.8121
> 2012-10-01,16:00,90.8121
> 2012-10-01,18:00,90.8091
> 2012-10-01,19:00,90.8030
>
>   I know it can be done in awk with a flag; but don't know how to do this
> correctly. :-)
>
>
> Thanks,
>
> Rich
>
>
> _______________________________________________
> PLUG mailing list
> PLUG at pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug



-- 
Carl K



More information about the PLUG mailing list