[PLUG] Data extraction
drew wymore
drew.wymore at gmail.com
Sun Apr 4 21:31:47 UTC 2010
On Sun, Apr 4, 2010 at 12:31 PM, Michael Rasmussen <michael at jamhome.us> wrote:
>
> On Sun, Apr 04, 2010 at 12:10:03PM -0700, drew wymore wrote:
>> I have a large data set that is being exported from an Oracle DB,
>> unfortunately I can't work with the data directly in Oracle or this
>> wouldn't be a problem. I can export it as CSV and work with it.
>> ... I don't really care which language I
>> do it in and whether I do it directly from csv or a database source
>> other than Oracle (because I can't).
>>
>> Any clue sticks, ideas or links to something that might help me solve
>> this problem appreciated.
>
> With apologies to Randal...
>
> Assume you export to CSV and, for the purposes of this simple example there
> are no text fields that have commas embedded.
>
> And if the data of interest is in the third column:
>
> 3,14,word,blah,blech,bz
> 4,18,term,more,stuff
>
> then:
>
> perl -ne '@F=split /,/; $words{$F[2]}++; \
> END{ foreach $word (sort { $words{$a} <=> $words{$b} } keys %words) \
> { print "$word\t$word_appearance{$word}\n"; } } ' file_of_data.cvs
>
> Assuming you want it sorted by word frequency.
>
> Disclaimer: I'm at my in-laws for easter dinner and didn't test that.
> I'm reasonably sure that it's close enough that any gaps will serve
> as an exercise for the reader.
>
> --
> Michael Rasmussen, Portland Oregon
> Trading kilograms for kilometers since 2003
> Be appropriate && Follow your curiosity
> http://www.jamhome.us/
> The Fortune Cookie Fortune today is:
> At once it struck me what quality went to form a man of achievement,
> especially in literature, and which Shakespeare possessed so enormously
> -- I mean negative capability, that is, when a man is capable of being
> in uncertainties, mysteries, doubts, without any irritable reaching
> after fact and reason.
> -- John Keats
> _______________________________________________
> PLUG mailing list
> PLUG at lists.pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug
>
Thanks Rich and Michael. I'll give the perl a shot and see what
happens. As far as the data layout. It's 5 columns with roughly 1100
rows, the column I'm interested in has a variable number of words per
entry but doesn't exceed a couple hundred words.
I did enable fulltext searching within mysql which works fine for
searching but doesn't give me the flexibility I'm looking for to
actually just get a count of unique words. I did find something in PHP
that is supposed to work but it's barfing on the array that's being
returned by the mysql query.
Drew-
More information about the PLUG
mailing list