Dear Wikindx developers,
as you probably know I'll be working with Mark to integrate uniwakka=20
with wikindx. As a first step I'm converting UniWakka to use OSbib, and=20
I found some issues with it that affect wikindx too.
1. utf-8: OSbib requires utf-8 input, but it does not correctly handle=20
multibyte strings (that is to say, charcaters above single byte ascii)=20
when style['titleCapitalization'] is set. This is due to the fact that=20
you utf8_decode the string and then utf8_encode back: but utf8_decode=20
only works with single byte characters.
You should use the same approach used in wikindx (with the UTF8 class):=20
decodeUtf8. Anyway this function is not compatible with encode_utf8,=20
since multibyte characters are decoded as Unicode entities. So I wrote a=20
couple of functions to encode strings decoded with decodeUtf8.
This is far from being a clean solution, since strtolower works only=20
with single byte character (that is to say with ascii characters: try=20
inserting in wikindx this title: "Non =C8 Vero" and set the style to=20
turabian). In order to being able to transform a string with non ascii=20
characters to lower case you should either use the iso-8859-1 charset=20
or, with utf8 encodings, mb_strtolower. My approach at least will not=20
eat multibyte characters (try inserting in wikindx a Chinese or Russian=20
title and set the style to turabian to have an idea of what I'm talking=20
I discussed some of the problem of the utf8/php here:
2. PARSEXML does not correctly handle entities in the xml styles. If I=20
set Ӓ as the author separator, I'll get a question mark. Moreover=20
if I set it to & I'll get a good rtf formatting but invalid xhtml, since=20
in xhtml & should be &
To solve the problem I had to modify the class adding a method for=20
dealing with special characters in the bib styles.
Attached to this meaasge you'll find a patch and a new version of=20
UTF8.php (to be use with OSbib, but also with wikindx, if you think this=20
is fine). Please let me know what you think about it.
All the best.