From: Matthew M. <ma...@tu...> - 2006-12-08 19:46:51
|
Hi all, Some users are having problems converting from 0.10.x to 1.0.0. The conversion was mangling accented characters. If someone could please send me a small db dump of their 0.10.x web pages written in German, French, Dutch, etc., I'd appreciate it. Please include a text file with good examples of broken words and characters so I can cross reference. BTW - I uploaded a new core today that converts branches. Check Boost for details. Thanks Matt -- Matthew McNaney Electronic Student Services Appalachian State University http://phpwebsite.appstate.edu |
From: Verdon V. <ve...@ve...> - 2006-12-08 20:07:08
|
Hi Matt, I haven't tried converting any sites and don't really foresee any in the immediate future to be honest. Too many 3rd party mods in use, not to mention my own that aren't converted yet ;-) That said, I can provide some dumps of some pagemaster pages and announcements and maybe fatcats in French if you just need some to try. One other hurdle you might run into is different character encodings. Anything newer and default should be UTF-8 but I don't think you can count out running into data in other encodings. I know at one time I had a few sites, that likely started as .9.x sites, that I converted from ISSO-8859-1 to UTF-8. I suppose it could even be possible to run into both in a site that's been updated a few times. Anywise, if you want some dumps but no examples, let me know. On 8-Dec-06, at 2:42 PM, Matthew McNaney wrote: > Hi all, > > Some users are having problems converting from 0.10.x to 1.0.0. The > conversion was mangling accented characters. > > If someone could please send me a small db dump of their 0.10.x web > pages written in German, French, Dutch, etc., I'd appreciate it. > Please > include a text file with good examples of broken words and > characters so > I can cross reference. > > BTW - I uploaded a new core today that converts branches. Check Boost > for details. > > Thanks > Matt > > -- > Matthew McNaney > Electronic Student Services > Appalachian State University > http://phpwebsite.appstate.edu > > > ---------------------------------------------------------------------- > --- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to > share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php? > page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Phpwebsite-developers mailing list > Php...@li... > https://lists.sourceforge.net/lists/listinfo/phpwebsite-developers |
From: Shaun M. <sh...@ae...> - 2006-12-09 00:41:27
|
On 8 Dec 2006, at 20:06, Verdon Vaillancourt wrote: > > One other hurdle you might run into is different character encodings. > Anything newer and default should be UTF-8 but I don't think you can > count out running into data in other encodings. I know at one time I > had a few sites, that likely started as .9.x sites, that I converted > from ISSO-8859-1 to UTF-8. I suppose it could even be possible to run > into both in a site that's been updated a few times. > I've a few like that though I've tended to stick with ISO and strip out the code in phpwebsite that does anything with UTF. Most of them were MySQL 4.0 too so no character set support. I don't think it's actually worth using UTF unless you're on MySQL 4.1 or later. Most of the time the one I come up against is pound signs of course. Again though, I'm not likely to be converting them for quite some time. 0.10.x just has a lot more features at present and a couple of important modules that aren't on 1.0. I'm starting a proper 1.0 project now though so that I can work out what needs doing. Shaun aegis design - http://www.aegisdesign.co.uk aegis hosting - http://www.aegishosting.co.uk |
From: Yves K. <ph...@fi...> - 2006-12-09 12:18:05
|
Hi Matt I'm not familiar enough with fallout code to present a real fix, but i played around a bit with the conversion and utf-8 problem. The answers i found (maybe not complete) are the following: The database table's charset and collation did not affect my results. Your conversion script did work allmost fine. The characters where stored in the database either if the table was with latin1_german2_ci or with utf8_unicode_ci collation. But the text displayed on the webpage was garbage. I found, the page was displayed ok, if i change the browser to ISO-8859-1 encoding ?! So i started to investigate the connection and the output (layout). Things i tried: I sent "SET NAMES utf8" to mysql direct after 'connect'; in convert/class/Convert.php and also in pear/DB/mysql.php . This is probably not needed. (But not yet investigated). I added this line: $text = utf8_encode($text); //yok at line 66 in layout/class/Layout.php before: Layout::_loadBox($text, $module, $content_var); Thereafter the output on the website (only webpage tested) was fine. But PhpWebSite is still not fixed. If i enter a text now inside phpws (webpages) the output of it is garbage. The database content shows ä instead of the real lowercase_a_umlaut. This makes me believe, the content is html-encoded and not utf-8. Conclusion: The database seems not to be the problem, if mysql is newer than 4.1.x . But the mysql-server has to know wich encoding on the client-side is used. It handles the tables and it's collation on whatever setup. This is imho good news, because lot of users out there get a preconfigured db from the host and are not able to change charset nor collation. Since the server knows the client-encoding, he is translating the db-content to it (eg. utf-8). But what we have to do now, is to make sure, all the ingoing and outgoing content inside phpws is encoded as correct utf-8 also. Maybe we have to use a function to check the content if it is allready encoded?! eg. (not mine, somewhere from the net): /** * Checks if String is UTF-8 Encoded * @param string $string string to check * @return boolean */ function is_utf8($string) { return preg_match('%^(?: [\x09\x0A\x0D\x20-\x7E] # ASCII | [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte | \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte | \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates | \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3 | [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 | \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16 )*$%xs', $string); } and convert if not: /** * Encodes String to UTF8 * @param string $string * @return string */ function cms_utf8_encode($string) { if(is_utf8($string)) { return $string; } else { if(function_exists('mb_convert_encoding')) { return mb_convert_encoding($string,'utf-8'); } else { return utf8_encode($string); } } } We should also double-check the headers and meta-tags of the output: eg: <meta http-equiv="content-type" content="application/xhtml+xml;charset=utf-8" /> eg: header('content-type: text/html; charset=utf-8'); and maybe also in css ??? eg: @charset "utf-8"; And last but not least; to work with forms, the charset should be defined: <form accept-charset="utf-8" method= ...> All this is 'only' some kind of brainstorming. But maybe the direction, where to go, to handle different languages, charsets and encodings... Regards Yves |
From: Verdon V. <ve...@ve...> - 2006-12-10 15:59:55
|
Hi fellow devs, ... still working on re-writing the first of my modules for phpws 1.x ... many new habits to learn, but I want to get it right and not just do a quick conversion :) I have questions about image uploads and best practices. I don't want to use File Cabinet (FC) in this case. I want this mod's images to be separate. I want any logged in user to be able to upload an image without special permissions, and also, I don't want users posting to this mod to have access to any other images on the site. That raises another point I hadn't thought of at first... I don't want other mods having access to the images uploaded by this mod, via FC, for use in other parts of the site either. At first glance, although I really like a lot of what FC has to offer, it seems to give pretty broad access to the resources of all modules, to any module where it is used and I guess that's the point. However, I can imagine scenarios where a Profiler user might inadvertently delete an image used in some other mod, while poking around for an image for the profile being edited, not to mention being able to upload in any mod's image dir. What if for some reason, while adding Profiler records, I upload all my images in the blog image dir. Then later on, an admin un-installs blog, not knowing that the Profiler editor wasn't very bright :) A couple thoughts come to mind (and I am just thinking out-loud)... 1) It might be useful if mods had to be registered (or not) with FC at boost or perhaps later with settings within FC. This would allow a mod developer to protect the resources of their mod from other mods' users via FC. If these was achieved with settings within FC, perhaps it would only apply to non-diety users. 2) Maybe it would be better if there were two sorts of upload scenarios to FC. By that I mean, if the image/file upload screen is invoked from within some other module (like Profiler does for images) then only the image/file directory for the invoking mod is allowed for uploading to. If FC is accessed directly (Control Panel > Administration > FC) then all image/file dirs are available for upload. Anywise, back to saving images in my mod, assuming I'm not going to use FC. I used to use EZform::saveImage() but that no longer exists. I can write my own function and have been looking around for examples, mostly trying to figure out what FC is doing when uploading/ saving an image. It would be really useful if there was a saveImage() function in /core/class/File.php. Perhaps if I write a solid enough function for my mod it can be moved to core in future versions :) Does anyone have any advice as to whether I should use the old EZform::saveImage() as a starting point, or if I should further explore what FC's image/file class is doing? It looks like FC is passing a lot of stuff off to Pear functions and I haven't followed that thread yet. I'm starting to lose sight of the forest for the trees ;-) Best regards, verdon |
From: Matthew M. <ma...@tu...> - 2006-12-11 14:32:58
|
Thank you for testing. Could you please send me a database dump so I can test locally? Matt On Sat, 2006-12-09 at 13:17 +0100, Yves Kuendig wrote: > Hi Matt > > I'm not familiar enough with fallout code to present a real fix, but i > played around a bit with the conversion and utf-8 problem. > > The answers i found (maybe not complete) are the following: > The database table's charset and collation did not affect my results. > Your conversion script did work allmost fine. The characters where stored in > the database > either if the table was with latin1_german2_ci or with utf8_unicode_ci > collation. > But the text displayed on the webpage was garbage. > I found, the page was displayed ok, if i change the browser to ISO-8859-1 > encoding ?! > So i started to investigate the connection and the output (layout). > > Things i tried: > I sent "SET NAMES utf8" to mysql direct after 'connect'; in > convert/class/Convert.php and also in pear/DB/mysql.php . > This is probably not needed. (But not yet investigated). > I added this line: > $text = utf8_encode($text); //yok > at line 66 in layout/class/Layout.php before: > Layout::_loadBox($text, $module, $content_var); > > Thereafter the output on the website (only webpage tested) was fine. > > But PhpWebSite is still not fixed. If i enter a text now inside phpws > (webpages) the output of it is garbage. The database content shows ä > instead of the real lowercase_a_umlaut. This makes me believe, the content > is html-encoded and not utf-8. > > Conclusion: > The database seems not to be the problem, if mysql is newer than 4.1.x . But > the mysql-server has to know wich encoding on the client-side is used. It > handles the tables and it's collation on whatever setup. This is imho good > news, because lot of users out there get a preconfigured db from the host > and are not able to change charset nor collation. > Since the server knows the client-encoding, he is translating the db-content > to it (eg. utf-8). > But what we have to do now, is to make sure, all the ingoing and outgoing > content inside phpws is encoded as correct utf-8 also. > > Maybe we have to use a function to check the content if it is allready > encoded?! > eg. (not mine, somewhere from the net): > > /** > * Checks if String is UTF-8 Encoded > * @param string $string string to check > * @return boolean > */ > function is_utf8($string) > { > return preg_match('%^(?: > [\x09\x0A\x0D\x20-\x7E] # ASCII > | [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte > | \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs > | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte > | \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates > | \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3 > | [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 > | \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16 > )*$%xs', $string); > } > > and convert if not: > > /** > * Encodes String to UTF8 > * @param string $string > * @return string > */ > function cms_utf8_encode($string) > { > if(is_utf8($string)) > { > return $string; > } else { > if(function_exists('mb_convert_encoding')) > { > return mb_convert_encoding($string,'utf-8'); > } else { > return utf8_encode($string); > } > } > } > > > We should also double-check the headers and meta-tags of the output: > eg: <meta http-equiv="content-type" > content="application/xhtml+xml;charset=utf-8" /> > eg: header('content-type: text/html; charset=utf-8'); > and maybe also in css ??? > eg: @charset "utf-8"; > > And last but not least; to work with forms, the charset should be defined: > <form accept-charset="utf-8" method= ...> > > > All this is 'only' some kind of brainstorming. But maybe the direction, > where to go, to handle different languages, charsets and encodings... > > Regards > Yves > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Phpwebsite-developers mailing list > Php...@li... > https://lists.sourceforge.net/lists/listinfo/phpwebsite-developers -- Matthew McNaney Electronic Student Services Appalachian State University http://phpwebsite.appstate.edu |