[PLUG] unicode problem with Postgresql and Perl DBD::Pg
Russell Senior
seniorr at aracnet.com
Mon Feb 16 15:07:02 UTC 2004
>>>>> "Chris" == Chris Jantzen <chris at maybe.net> writes:
Russell> For some random email messages, as I stuff them into a
Russell> Postgresql table's text field via Perl DBD::Pg, I get a
Russell> message along the lines of:
Russell> DBD::Pg::st execute failed: ERROR: Unicode characters greater
Russell> than or equal to 0x10000 are not supported at ...
Russell> [...]
Russell> The email body contains some circumflected character or
Russell> another.
Russell> Me caveman. How make work?
Chris> Without details, I'd suggest explicitly specifying the
Chris> character set of the data while inserting into the database. It
Chris> looks like it has guessed that you're trying to give it 32-bit
Chris> Unicode, which very few apps support.
Can I specify the character set on a row-by-row basis? It looks like
not. In the corpus I am looking at, I see a wide variety of
content-type headers:
text/plain; charset=us-ascii
text/plain; charset=iso-8859-1
text/plain; charset=iso-8859-5
text/plain; charset=iso-8859-7
etc.
One message in particular is labeled with iso-8859-1 and contains
"Jörg" in the body, the second character of which appears as 0xf6 in
the mbox file.
Which encoding should I use and how do I specify it? I am a
Postgresql and DBD::Pg neophyte, if that isn't obvious already.
--
Russell Senior ``I have nine fingers; you have ten.''
seniorr at aracnet.com
More information about the PLUG
mailing list