[PLUG] gawk: modify field contents

Rich Shepard rshepard at appl-ecosys.com
Wed Jul 8 15:39:39 UTC 2015


On Wed, 8 Jul 2015, Robert Citek wrote:

> Do you know in advance which fields are text, integer, or floats?  Or
> can a given field be of mixed data-type?

Robert,

   Each field must be one type of data. It's when spreadsheet preparers
combine text symbols such as '<' in a numeric column that things get FUBAR.

   R doesn't care about the data type in each column. When the data is read
into a data.frame with read.table(), read.csv(), or read.delim() R
automagically recognizes numeric types as integer or float; all others are
either classified as factors unless 'stringsAsFactors = F' is specified as
an argument to the function. In R data.frames consist of columns of lists,
and each column can be of a different type. It is necessary, for example. to
coerce the sampdate column from factor to date using the as.Date() function.

   Germane to cleaning the raw data prior to reading it into R, those fields
that need to be modified are integer or floating point (text, per se, is not
an issue) and I believe that a correctly formulated regex can identify the
field as integer or floating point.

   Does this answer your question?

Rich




More information about the PLUG mailing list