Screenshot instructions:
Windows
Mac
Red Hat Linux
Ubuntu
Click URL instructions:
Right-click on ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)
From: Naoki Hiroshima <n@h7...> - 2006-11-17 09:29:29
Attachments:
htmlentities.patch
|
Hello, Current version (0.611) has a chance to garble comment author's name because of incorrect use of htmlentities(). For example, if I filled out my name in Japanese, it never got through as it was and ended up showing garbled characters. So, this patch is essential for languages other than ISO-8859-1. By the way, I found that many of files are pretty mess with CRLF and LF. I am not sure what you are using (probably Windows, eh?) and how this mess happened. Having LF lines and CRLF lines in a file is little bothersome :-( As you may know, you can remove all CR by the following command if you have those commands: $ find . -name "*.php" | xargs perl -pi -e 's/^M//' I did this to my working files but better applied to original files. Thanks, -- Hiroshima |
From: NoWhereMan <nowhereman@fl...> - 2006-11-17 17:55:25
|
Naoki Hiroshima wrote: >> Hello, >> >> Current version (0.611) has a chance to garble comment author's name >> because of incorrect use of htmlentities(). For example, if I filled >> out my name in Japanese, it never got through as it was and ended up >> showing garbled characters. So, this patch is essential for >> languages other than ISO-8859-1. Oh! Great. Thank you. I've been thinking about this issue today. As we finally are supporting utf-8 (I would like to just not to support the others...), would wp_specialchars() work better? you can find it in fp-includes/core/core.wp-formatting.php I guess it would be even better than the PHP original, as it should do less checks. What do you think? >> By the way, I found that many of files are pretty mess with CRLF and >> LF. I am not sure what you are using (probably Windows, eh?) and >> how this >> mess happened. Having LF lines and CRLF lines in a file is little >> bothersome :-( The reason for this mess is an horrible horrible editor on Windows (which once I used to love, SciTE) which was supposed to handle correctly line ends. but that in fact it doesn't :/ Usually I work on Linux, but ATM I'm working with Windows... thank you for the per oneliner. I've got cygwin here, so I gave a a dos2unix on all of them. I didn't think about xargs... sometimes I just forget the basics :D >> Thanks, Thank *you* :) |
From: Naoki Hiroshima <n@h7...> - 2006-11-17 19:50:58
|
NoWhereMan wrote: > I've been thinking about this issue today. As we finally are supporting > utf-8 (I would like to just not to support the others...), would Absolutely. What on earth can be wrong with UTF-8 unless you want to discriminate many of languages? I think I really like you to officially decide not to support anything but UTF-8 in order to make many people's life tremendously easier ;-) > wp_specialchars() work better? you can find it in > fp-includes/core/core.wp-formatting.php > I guess it would be even better than the PHP original, as it should do less > checks. What do you think? Ah, I didn't realize this function. Yes, this works better. But I have a question. What is the below for? $text = preg_replace('/&([^#])(?![a-z12]{1,8};)/', '&$1', $text); Come to think of it, I think we don't have to worry about anything but 5 characters in UTF-8 world and 1 character in FP world as below: $text = str_replace('|', '|', $text); // just for FP format $text = str_replace('&', '&', $text); $text = str_replace('<', '<', $text); $text = str_replace('>', '>', $text); if ($quotes) { $text = str_replace('"', '"', $text); $text = str_replace("'", ''', $text); } All of other characters can be simply represented directly in UTF-8. >>> By the way, I found that many of files are pretty mess with CRLF and >>> LF. I am not sure what you are using (probably Windows, eh?) and >>> how this >>> mess happened. Having LF lines and CRLF lines in a file is little >>> bothersome :-( > > The reason for this mess is an horrible horrible editor on Windows (which > once I used to love, SciTE) which was supposed to handle correctly line > ends. I was using xemacs when I used to work on Windows :-) I have switched to OSX about a month ago and I am free from those Windows crap.... NOT :-( Occasionally I need to run Windows in Parallels and deal with it. > but that in fact it doesn't :/ Usually I work on Linux, but ATM I'm working > with Windows... thank you for the per oneliner. I've got cygwin here, so I > gave a a dos2unix on all of them. I didn't think about xargs... sometimes I > just forget the basics :D Another oneliner I would like to suggest is: $ find . \( -name "*.php" -o -name "*.tpl" \) | xargs perl -pi -e 's/<form /<form accept-charset="utf-8" /' Thanks, -- Hiroshima |
From: NoWhereMan <nowhereman@fl...> - 2006-11-17 20:20:55
|
----- Original Message ----- From: "Naoki Hiroshima" <n@...> > NoWhereMan wrote: >> wp_specialchars() work better? > Ah, I didn't realize this function. Yes, this works better. But I have > a question. What is the below for? > > $text = preg_replace('/&([^#])(?![a-z12]{1,8};)/', '&$1', $text); it looks like a way to unescape sequences of entities... something like that... if it finds &entity; it won't change it into &entity; maybe... :P so the whole code for that func should be ok... >> The reason for this mess is an horrible horrible editor on Windows (which >> once I used to love, SciTE) which was supposed to handle correctly line >> ends. > > I was using xemacs when I used to work on Windows :-) cool, let's start a flamewar about vim (I use vim) :D > Another oneliner I would like to suggest is: > > $ find . \( -name "*.php" -o -name "*.tpl" \) | xargs perl -pi -e > 's/<form /<form accept-charset="utf-8" /' Is this really needed? I thought it would have better if the user agents could figure it out theirselves from the page encoding. BTW if we really drop other encs for utf... :/ I'm not sure... bye |
From: Naoki Hiroshima <n@h7...> - 2006-11-17 21:00:52
|
NoWhereMan wrote: > it looks like a way to unescape sequences of entities... something like > that... > if it finds &entity; it won't change it into &entity; maybe... :P I see... Besides, today's WP seems to be using: $text = str_replace('&&', '&&', $text); $text = str_replace('&&', '&&', $text); $text = preg_replace('/&(?:$|([^#])(?![a-z1-4]{1,8};))/', '&$1', $text); >> I was using xemacs when I used to work on Windows :-) > > cool, let's start a flamewar about vim (I use vim) :D Okay, vim sucks, emacs rules!! heheh. Actually, I always use vim only when I am working as "root" because it forces me to be really cautious ;-) >> $ find . \( -name "*.php" -o -name "*.tpl" \) | xargs perl -pi -e >> 's/<form /<form accept-charset="utf-8" /' > > Is this really needed? I thought it would have better if the user agents > could figure it out theirselves from the page encoding. BTW if we really > drop other encs for utf... :/ > I'm not sure... The thing is, browsers don't have to send a message in the same encoding of the page. This used to be a typical problem in Japanese since even if the page encoding is "euc-jp", some of browsers can send a message in "sjis". Maybe there is no issue nowadays but specifying explicitly is better, I suppose. In terms of dropping other encodings, I am sure that it just makes more of people's life much easier than otherwise. I don't say there is no reason but I can't think of any compelling reason why someone really wants to use something other than UTF-8 unless it's UTF-16. Thanks, -- Hiroshima |
From: NoWhereMan <nowhereman@fl...> - 2006-11-18 08:28:33
|
----- Original Message ----- From: "Naoki Hiroshima" > NoWhereMan wrote: [escaping with entities] >> it looks like a way to unescape sequences of entities... something like >> that... >> if it finds &entity; it won't change it into &entity; maybe... :P > > I see... Besides, today's WP seems to be using: > > $text = str_replace('&&', '&&', $text); > $text = str_replace('&&', '&&', $text); > $text = preg_replace('/&(?:$|([^#])(?![a-z1-4]{1,8};))/', '&$1', > $text); Code was from 1.5; probably they changed it in 2.0... ok maybe we could sync some of the code [explicit accept-charset in forms] > The thing is, browsers don't have to send a message in the same encoding > of the page. This used to be a typical problem in Japanese since even > if the page encoding is "euc-jp", some of browsers can send a message in > "sjis". Maybe there is no issue nowadays but specifying explicitly is > better, I suppose. Oh, I see. > I don't say there is no reason but I can't think of any compelling > reason why someone really wants to use something other than UTF-8 unless > it's UTF-16. The only reason is retaining native compatibility with SPB which uses different encodings depending on your language, but I would really drop full compatibility. Files for the Italian language are already utf encoded (not ASCII!) so if you use an encoding other than UTF, accents will result in strange characters. Maybe I could make the installer detect if the fp-content/content/ is not empty and ask immediatly if you want to perform a conversion, but the problem is that I can't guess the actual encoding of the files. I tried with all the mb_* function series but it looked like it didn't guess anything :( Moreover I fear server timeouts, and I wouldn't want it to stop in the middle of a conversion, which would lead you to do a re-conversion next time you'll launch setup, and so mess with everything: it would try double-converting to utf-8, treating the already-converted TXTs as european ISO. I'll let you imagine what would happen. bye |