[PLUG] Web site programming

Carlos Konstanski ckonstanski at pippiandcarlos.com
Thu Nov 23 01:10:21 UTC 2006


>> It's also possible to make the GIF/IPA thing configurable; maybe there's
>> even a reliable way to ask the browser if it can display IPA.

Here's one possible solution.  Examine the Accept-Charset HTTP header
and look for utf-8 support.  Here's a wireshark packet trace of a GET
request:

GET http://www.arts.gla.ac.uk/IPA/ipafonts.html HTTP/1.1
Host: www.arts.gla.ac.uk
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.0.8) Gecko/20061025 Firefox/1.5.0.8
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Proxy-Connection: keep-alive
Referer: http://www.arts.gla.ac.uk/IPA/ipa.html
If-Modified-Since: Wed, 15 Mar 2006 09:45:11 GMT
If-None-Match: "52b745271548c61:9c3"
Cache-Control: max-age=0

If a client supports UTF-8, then give him a page in which the
Content-Type is set:

<META http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

Note: if you are sending XML pages to the client rather than HTML
pages, change the MIME type accordingly to text/xml.  I would use
UTF-8 for both the XML and the XSLT pages in this case.  But it's
unlikely you'll be going that route if you're not doing web
development as a paying gig.  Only paying gigs will go through that
much trouble to produce a web page.

Here's some links:

http://depts.washington.edu/llc/help/presentations/unicode_ipa/1_introduction.html
http://www.cl.cam.ac.uk/~mgk25/unicode.html

Keep in mind that unicode is a mapping of integers to characters (a
"character set"), while UTF-8 is a way to express those integers as
bytes in a computer file (a "character encoding").  The IPA characters
are part of the unicode character set (or so I think I have read),
which implies that they will not change from computer to computer,
since the unicode character set is wide enough to have a unique code
for every character on earth (or so they claim).  With UCS-4, a
4-byte-wide unicode character set, this is believable.

Carlos Konstanski



More information about the PLUG mailing list