From: Tony L. <la...@is...> - 2002-10-27 07:50:38
|
Further to this issue. We have read through the utf8migration and doublebyte documents, and through the archives of phpwiki-talk. We have also downloaded index.php and the template files (browse.html, et al) from the cvs, and hacked them to get them to work with Postnuke. Tried sticking on those header lines mentioned on the the above-mentioned wiki pages. We were not able to get that utf-8 data that was screen garbage, to display correctly (though other pages with utf-8 were displaying properly). Then we experimented a bit more and discovered that certain characters in the strings being input were being mangled by the wiki. Thus, we determined that the utf-8 in the sandbox and homepage is being displayed largely due to luck; the characters being used were somehow passing through unscathed - probably the exception rather than the rule. We also compared the source of the pages that were being displayed properly and those not being displayed properly, and found the header information to be identical. The page is being displayed properly (the Postnuke-generated utf-8 data is fine), but the stuff in between the div /div that phpWiki generates, is getting mangled. Wiki clones in Asia have made refinements to Wiki code (or designed their code) so that this doesn't happen. We are looking into what they are doing, and hope to see if it can be applied to this rather specific problem, i.e., the code for the phpWiki module for Postnuke. Quite possibly it will involve the use of things mb_* things, or maybe rawurlencode something or other. However, we are quite puzzled, and suspect that there is an easy answer within the domain and experience of phpWiki users that might be had. What might that be? For those of you not familiar, the phpWiki module for Postnuke includes the latter's "header.php" file, and so allows the CMS to deal with the header issue for each page generated. Thus, phpWiki's index.php is missing all that header information that is in that of phpWiki proper. The relevant lines from Postnuke's header.php are as follow: else { echo "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\ n"; echo "<html>\n<head>\n"; if (defined("_CHARSET") && _CHARSET != "") { echo "<meta http-equiv=\"Content-Type\" ". "content=\"text/html; charset="._CHARSET."\">\n"; } There is no mention of mb_ anything anywhere in the code, yet there is no problem with screen garbage. That _CHARSET, btw, is set to utf-8 on our system, in all instances. These lines are the only relevant code that exists within the Postnuke that we are running. Thus, these lines and these lines alone are able to ensure that all pages display all the utf-8 data correctly, in Postnuke. For our purposes, the reported limitations that php and mysql have regarding utf-8 are not an issue. That is also true with all modules we use with Postnuke. Except, unfortunately, for this one (so far). So, we suspect that there may be something that is causing the data in between the div /div that phpWiki makes, to get mangled, while the page itself is displayed properly. If we can locate it, and tweak that, we should be in business. But, we haven't managed to find out what that mechanism might be just yet. Looking forward to advice from both the Wabi and Sabi experts among you. Thanks. Tony Laszlo, ISSHO http://www.issho.org On Sat, 26 Oct 2002, Tony Laszlo wrote: > on a soon-to-be-launched multilingual site whose data is > all stored as utf-8. > We were able to save small strings of non-English data on the > homepage and in the Sandbox, with no problem. > Those pages can be seen here: > http://www.issho.org/modules.php?op=modload&name=phpWiki&file=index&pagename=HomePage However, [we have major problems otherwise, see here]: > http://www.issho.org/modules.php?op=modload&name=phpWiki&file=index&pagename=IsshoSelfSupport_en |