From: Bodo S. <bo...@le...> - 2008-01-29 15:29:08
|
Michael Peters wrote: > I haven't played around with XS for anything serious. I would think that doin > g > UTF-8 work in would suck big time. > > Have you tried asking on Perlmonks? No, I didn't. Anyway, I had a much to narrow view of the problem, narrowed to chars which, although UTF-8, *may* fit into one byte. If we have wide chars the solution I envisaged would not work and that's perhaps why this "long-standing bug" won't be fixed. In fact, the module's author Gisle Aas tells us what to do when dealing with UTF-8 data: We have to encode_utf8() it to expose the bytes that make up UTF-8 chars. By exposing those bytes, md5() is happy again, since it gets what it wants: bytes. The easiest way to get *that* working -- and to get it working transparently without adding yet another wrapper Krang module for UTF-8 purposes -- the easiest way is to replace the md5* functions and the add() method loaded from XS land. It's just a bit of typeglob hackery and works like a charm. So I wrapped up a patched version of Digest::MD5, with some more tests for UTF-8 and the POD updated. I'll commit it to the trunk (if nobody objects). Bodo |