[PLUG] unicode problem with Postgresql and Perl DBD::Pg

Russell Senior seniorr at aracnet.com
Mon Feb 16 15:07:02 UTC 2004


>>>>> "Chris" == Chris Jantzen <chris at maybe.net> writes:

Russell> For some random email messages, as I stuff them into a
Russell> Postgresql table's text field via Perl DBD::Pg, I get a
Russell> message along the lines of:

Russell> DBD::Pg::st execute failed: ERROR: Unicode characters greater
Russell> than or equal to 0x10000 are not supported at ...

Russell> [...]

Russell> The email body contains some circumflected character or
Russell> another.

Russell> Me caveman.  How make work?

Chris> Without details, I'd suggest explicitly specifying the
Chris> character set of the data while inserting into the database. It
Chris> looks like it has guessed that you're trying to give it 32-bit
Chris> Unicode, which very few apps support.

Can I specify the character set on a row-by-row basis?  It looks like
not.  In the corpus I am looking at, I see a wide variety of
content-type headers:

  text/plain; charset=us-ascii
  text/plain; charset=iso-8859-1
  text/plain; charset=iso-8859-5
  text/plain; charset=iso-8859-7

etc.

One message in particular is labeled with iso-8859-1 and contains
"Jörg" in the body, the second character of which appears as 0xf6 in
the mbox file.

Which encoding should I use and how do I specify it?  I am a
Postgresql and DBD::Pg neophyte, if that isn't obvious already.

-- 
Russell Senior         ``I have nine fingers; you have ten.''
seniorr at aracnet.com




More information about the PLUG mailing list