I prepared now a multibyte string aware phpwiki version (UTF-8), which
requires PCRE with UTF-8 support, and the mbstring extension.
(probably the iconv extensions also)
Additionally all locale po files and pgsrc files must be converted
(there's a script) and all pages cache must be purged, or better load a
fresh virgin wiki.
It required hundreds of changes to most string functions, like prefixing
substr,strlen,strtolower,strotupper,strpos,... with "mb_",
adding the "u" modifier to all pcre_* calls, and so on.
As long as the DB backends don't support it that well, I don't want to
maintain it, but I try to keep it up-to-date.
I still have a lot of InlineParser errors with the $hugepat
lib\InlineParser.php:188: Warning[2]: Compilation failed: invalid
UTF-8 string at offset 1440
Either pcre is broken, or some string function destroys a string.
There are still some string functions left, for which no equivalent mb_
function exists. (str_replace e.g.)
Does anybody want to have a look or should I hack away until it works
(if time permits)?
Sometimes, we must switch to UTF-8 anyway. See PhpWiki:Utf8Migration.
For now I think it is enough to use the mbstring and PCRE detection
functions in CVS HEAD, and display a proper warning. e.g. sf.net has
such an old php (4.1.2) and no mbstring, that utf-8, chinese and
japanese cannot work there. maybe that's why the wikipedia folks have to
maintain their own set of webservers with current PHP versions.
I don't really want to maintain an utf-8 CVS branch. Tested snapshots
are enough I think.
--
Reini Urban
|