PHP and Character Sets

I’ve been trying to get WordPress’s post-by-email feature to work with image attachments. After a couple days of hacking (1.5 days of groping around in the dark, and half a day of hacking), I have mostly succeeded. 

Please hold your applause.

The catch is, it isn’t handling text in Japanese, posted from a Japanese cell phone. A minor inconvenience, but it happens to be my current itch, and I’m scratching it.

Turns out that PHP and its support for character sets is even worse than advertised. Here are some links and notes for my own future reference:

iconv is the old standby, but it doesn’t help all that much. MBstring looks like it will help me figure out what the current encoding of the string is.

Wish me luck while I figure this out.

UPDATE: MBstring works like a charm. Japanese phones use SHIFT_JIS internally but the email servers they use encode in ISO-2022-jp. And here is a nice lucid explanation of the Japanese encoding systems in common use: iso-2022-jp, Shift-JIS, and EUC-JP — although be aware that it is quite old, 1996, and predates widespread acceptance of Unicode.