[Phpwiki-talk] Re: phpWiki and utf-8/double-byte

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Further to this issue. 

We have read through the utf8migration and doublebyte documents, 
and through the archives of phpwiki-talk. We have also downloaded 
index.php and the template files (browse.html, et al) from the cvs, 
and hacked them to get them to work with Postnuke. 
Tried sticking on those header lines mentioned on the the above-mentioned 
wiki pages. 

We were not able to get that utf-8 data that was screen garbage, to display 
correctly (though other pages with utf-8 were displaying properly). 
Then we experimented a bit more and discovered that certain characters 
in the strings being input were being mangled by the wiki. 

Thus, we determined that the utf-8 in the sandbox and homepage is being 
displayed largely due to luck; the characters being used were somehow 
passing through unscathed - probably the exception rather than the rule.   

We also compared the source of the pages that were being displayed 
properly and those not being displayed properly, and found the header  
information to be identical. The page is being displayed properly 
(the Postnuke-generated utf-8 data is fine), but the stuff in between 
the div /div that phpWiki generates, is getting mangled.  

Wiki clones in Asia have made refinements to Wiki code (or designed 
their code) so that this doesn't happen. We are looking into what they 
are doing, and hope to see if it can be applied to this rather specific 
problem, i.e., the code for the phpWiki module for Postnuke. Quite 
possibly it will involve the use of things mb_* things, or maybe 
rawurlencode something or other. 

However, we are quite puzzled, and suspect that there is an easy answer 
within the domain and experience of phpWiki users that might be had. 

What might that be? 

For those of you not familiar, the phpWiki module for Postnuke 
includes the latter's "header.php" file, and so allows the CMS 
to deal with the header issue for each page generated. Thus, 
phpWiki's index.php is missing all that header information that 
is in that of phpWiki proper. 

The relevant lines from Postnuke's header.php are as follow: 

else {
        echo "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\
n";
        echo "<html>\n<head>\n";

        if (defined("_CHARSET") && _CHARSET != "") {
                echo "<meta http-equiv=\"Content-Type\" ".
                     "content=\"text/html; charset="._CHARSET."\">\n";
        }

There is no mention of mb_ anything anywhere in the code, yet 
there is no problem with screen garbage. 

That _CHARSET, btw, is set to utf-8 on our system, in all instances. 

These lines are the only relevant code that exists within the 
Postnuke that we are running. Thus, these lines and these lines 
alone are able to ensure that all pages display all the utf-8 
data correctly, in Postnuke. For our purposes, the reported limitations 
that php and mysql have regarding utf-8 are not an issue. That is also 
 true with all modules we use with Postnuke. Except, unfortunately, 
for this one (so far). 

So, we suspect that there may be something that is causing the 
data in between the div /div that phpWiki makes, to get mangled, 
while the page itself is displayed properly. If we can locate it, 
and tweak that, we should be in business. 

But, we haven't managed to find out what that mechanism might 
be just yet. 

Looking forward to advice from both the Wabi and Sabi experts among you. 

Thanks. 

Tony Laszlo, ISSHO
http://www.issho.org

On Sat, 26 Oct 2002, Tony Laszlo wrote:

> on a soon-to-be-launched multilingual site whose data is 
> all stored as utf-8. 
> We were able to save small strings of non-English data on the 
> homepage and in the Sandbox, with no problem. 
> Those pages can be seen here: 
> http://www.issho.org/modules.php?op=modload&name=phpWiki&file=index&pagename=HomePage
However, [we have major problems otherwise, see here]: 
> http://www.issho.org/modules.php?op=modload&name=phpWiki&file=index&pagename=IsshoSelfSupport_en