PHP and Character Sets

I’ve been trying to get WordPress’s post-by-email feature to work with image attachments. After a couple days of hacking (1.5 days of groping around in the dark, and half a day of hacking), I have mostly succeeded. 

Please hold your applause.

The catch is, it isn’t handling text in Japanese, posted from a Japanese cell phone. A minor inconvenience, but it happens to be my current itch, and I’m scratching it.

Turns out that PHP and its support for character sets is even worse than advertised. Here are some links and notes for my own future reference:

iconv is the old standby, but it doesn’t help all that much. MBstring looks like it will help me figure out what the current encoding of the string is.

Wish me luck while I figure this out.

UPDATE: MBstring works like a charm. Japanese phones use SHIFT_JIS internally but the email servers they use encode in ISO-2022-jp. And here is a nice lucid explanation of the Japanese encoding systems in common use: iso-2022-jp, Shift-JIS, and EUC-JP — although be aware that it is quite old, 1996, and predates widespread acceptance of Unicode.

python, php, and xslt

I had the occasion over the last couple of days to do some work with python, php, and xslt.

Python, which I haven’t really touched for a couple years, comes back fairly quickly. especially on this little project, where I am looking at two different code bases, no documentation except for the code–it is pretty easy to see what the authors are trying to do here. not too hard to hack away and make some progress.

PHP, I am relatively new to–and I am instantly productive. What I am doing is not rocket science, but it sure is easy. In fact–I am doing XSLT with PHP. Yeah, I know that libxsl is under the hood, doing its magic and that I am benefitting from that. also, by being a late PHP adapter, the implementation of XSL is now a bit more mature and easier to use.

But I’ve done fairly hefty XSL projects in python, Java, and Frontier, and the PHP implementation is so easy–it literally only took me about 5 minutes to get something working, and that includes the reading of the tutorial. Now I have a 40 line PHP file and a 5 line htaccess file, and with that I have a simple xml-data driven website.

Lastly–XSLT. It is funny how some technologies stay with you, and some don’t. XSL has stayed with me, and I was able to convert an HTML design mockup into a working xsl stylesheet in just a few hours tonight. Feels good, and it is fun–you can instantly see the results of your work.