Hi,
we're using phpwiki (http://www.wlug.org.nz) (version
"1.3.3-jeffs-hacks" including lots of our our minor
hacks, and thought you might want to know about some of
the stuff we did to get utf-8 mostly working everywhere.
Some of this might be done/different in cvs head - I
have no idea :)
phpwiki/index.php:
-define("CHARSET", "iso-8859-1");
+define("CHARSET","UTF-8");
we have a really ugly WikiNameRegexp - I couldn't get
pcre to use non-ascii [:upper:] and [:lower:] POSIX RE
classes, even with the right locale set:
$WikiNameRegexp =
"(?<![[:alnum:]])(?:(?:(?:[A-Z]|[\xc3][\x80-\x9e])(?:[a-z]|[\xc3][\x9f-\xbf])+){2,})(?![[:alnum:]]+)";
phpwiki/lib/HtmlElement.php:
-define('NBSP', "\xA0"); // iso-8859-x
non-breaking space
+define('NBSP',"\xC2\xA0"); // utf-8 non-breaking
space
-$FieldSeparator = "\x81";
+$FieldSeparator = "\xFF"; // this byte should
never appear in utf-8
phpwiki/lib/diff.php and display.php:
needs
+header("Content-Type: text/html; charset=" . CHARSET);
printed out before doing each GeneratePage
Logged In: YES
user_id=13755
japanese seems to work fine now with utf-8 now.
Can you check?
Logged In: YES
user_id=88277
Hi, I forgot a few things.
1) lib/editpage.php needs
+header("Content-Type: text/html; charset=" . CHARSET);
before the GeneratePage() call as well, and we also put it
in lib/main.php, at the top of the main() function.
2) We converted login.tmpl to use utf-8 encoding for the
example characters
3) We put the WikiNameRegexp back to
"(?:[[:upper:]][[:lower:]]+){2,}"; to keep it nice and clean,
and we modified lib/config.php 's pcre_fix_posix_classes()
function to turn [:upper:] and [:lower:] into the ugly regexp:
"0-9A-Za-z\xc0-\xd6\xd8-\xf6\xf8-\xff",
'alpha' =>
"A-Za-z\xc0-\xd6\xd8-\xf6\xf8-\xff",
# 'upper' =>
"A-Z\xc0-\xd6\xd8-\xde",
# 'lower' => "a-z\xdf-\xf6\xf8-\xff"
);
utf-8 non-ascii chars: most common (eg western) latin
chars are 0xc380-0xc3bf
we currently ignore other less common non-ascii characters
(eg central/east european) latin chars are 0xc432-0xcdbf
and 0xc580-0xc5be
and indian/cyrillic/asian languages
'(?:[a-z]|\xc3[\x9f-\xbf]|\xc4[\x81\x83\x85\x87])',
$regexp);
# this replaces [[:upper:]] with utf-8 match (Latin only)
$regexp = preg_replace('/[[\:upper\:]]/',
'(?:[A-Z]|\xc3[\x80-\x9e]|\xc4[\x80\x82\x84\x86])',
$regexp);
$regexp);
}
I pasted some Japanese into a page, you can check/edit our
page: http://www.wlug.org.nz/TestUtf8
I have no idea how you decide what is a WikiWord in kanji
though :)
Logged In: YES
user_id=13755
Now also missing automatic pcre_fix_posix_classes for utf-8
is iincluded in the latest CVS.
1.3.9 has all the other fixes included, just the
$WikiNameRegexp has to be fixed there manually.
I'll now included full dynamic language changes for utf-8
languages also, which currentlly has to use
define('CHARSET' 'utf-8');