From: Jeff D. <da...@da...> - 2001-06-08 15:42:48
|
Thank you for the comments Christian, >I'm quite new to PHPWiki and not at all familiar with it, but I've noticed >that at least in the default configuration there is no support for >entitized accents (=C1 becoming Ã) as we commonly use in non-english >countries. Yes, but you can type them in as ISO-8859-1 eight-bit characters just fine. (What you have to do to type these in depends on your OS & browser &c.) Do we want to support entities in the page text? If we do then '&' suddenly becomes a magic character. (I vote against it.) One cheap solution which might help a few people would be to add a line containing the funky characters to the editpage.html form. That way, people who can't figure out how to type in an accented character could at least cut and paste the one they want. >In my case, I've ... changed WikiNameRegexp to allow accents >in the link names. This has been done in the development branch. It probably won't be done in the stable branch, since that would change the semantics of existing wikis. (I suppose the alternative WikiNameRegexp could be added in a comment, though.) See also: http://phpwiki.sourceforge.net/phpwiki/index.php?WikiNameRegexp Jeff |
From: Jeff D. <da...@da...> - 2001-06-08 17:59:56
|
>Does & still have to be a magic character if it is only generated on output? I am not advocating writing (entering data in the form) entitized >-- this is broken and counter-productive; only displaying entitized, to >allow a wider audience with less breakage. This is _almost_ a non-issue >with Latin-1, since most browsers render that by default, but on >non-Latin-1 charsets (and thus, locales and languages), this _will_ be an >issue. Is this besides the point? Okay, I see that perhaps I misunderstood your initial suggestion. Stock PhpWiki really only supports ISO-8859-1. (Though I suppose it would be easy enough to hack it to support any other single eight-bit character set.) PhpWiki specifies the charset in a <meta http-equiv> tag in the HTML headers on each and every page: <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> As you note, the charset currently is not specified in the HTTP headers. If that is causing problems with some browsers (is it?), that's easy enough to fix (we should probably fix that anyway). I suppose entitizing upon output as you suggest doesn't hurt anything, but it still seems unnecessary to me. Is anyone using PhpWiki with any other non ISO-8859-1 eight-bit charset? (I know there have been many requests for multi-byte character support, but that's not an easy fix. I think that pretty much requires switching to using unicode/UTF-8 internally, and this won't be practical without unicode support compiled into PHP and its regexp libraries.) |
From: Christian R. R. <ki...@as...> - 2001-06-08 19:18:13
|
On Fri, 8 Jun 2001, Jeff Dairiki wrote: > PhpWiki specifies the charset in a <meta http-equiv> tag > in the HTML headers on each and every page: > > <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> > > As you note, the charset currently is not specified in the HTTP headers. > If that is causing problems with some browsers (is it?), that's easy enough > to fix (we should probably fix that anyway). It doesn't break any browser that _looks for_ Content-Type _and_ supports Latin-1, which probably makes this a non-issue. There could be browsers that ignore the META tags and thus need the HTTP headers, but since META is defined in everything since HTML2.0 I deem these browsers as broken. Your point is taken. > I suppose entitizing upon output as you suggest doesn't hurt anything, > but it still seems unnecessary to me. Apparently (and to me, surprisingly), you're right (as far as you set Content-Type, which we do). It's probably just one of those pedanticisms you acquire over time. :-) > (I know there have been many requests for multi-byte character support, > but that's not an easy fix. I think that pretty much requires switching > to using unicode/UTF-8 internally, and this won't be practical without > unicode support compiled into PHP and its regexp libraries.) And the database, if it does string matching, additionally (using LIKE, ~ and whatnot). Take care, -- /\/\ Christian Reis, Senior Engineer, Async Open Source, Brazil ~\/~ http://async.com.br/~kiko/ | [+55 16] 274 4311 |
From: Sandino A. <sa...@sa...> - 2001-06-09 07:29:01
|
Christian Robottom Reis wrote: > And the database, if it does string matching, additionally (using LIKE,= ~ > and whatnot). > Maybe use soundex() for string matching? A soundexed version should be sotred in parallel. -- Sandino Araico S=E1nchez Si no eres parte de la soluci=F3n, entonces eres parte del precipitado. |
From: Christian R. R. <ki...@as...> - 2001-06-09 22:20:05
|
On Sat, 9 Jun 2001, Sandino Araico S=E1nchez wrote: > Maybe use soundex() for string matching? > A soundexed version should be sotred in parallel. Hmmm. AFAIK soundex doesn't exist for every language, and you still need to be able to do raw string matching, for which the encoding will matter. Take care, -- /\/\ Christian Reis, Senior Engineer, Async Open Source, Brazil ~\/~ http://async.com.br/~kiko/ | [+55 16] 274 4311 |
From: Sandino A. <sa...@sa...> - 2001-06-09 07:27:32
|
Jeff Dairiki wrote: > > I suppose entitizing upon output as you suggest doesn't hurt anything, > but it still seems unnecessary to me. Entitizing has a big performance cost. An optional pre-entitized cache sh= ould be used to avoid entitizing every time the page is displayed. > > > Is anyone using PhpWiki with any other non ISO-8859-1 eight-bit charset= ? ISO-8859-15 > > (I know there have been many requests for multi-byte character support, > but that's not an easy fix. I think that pretty much requires switchi= ng > to using unicode/UTF-8 internally, and this won't be practical without > unicode support compiled into PHP and its regexp libraries.) > -- Sandino Araico S=E1nchez Si no eres parte de la soluci=F3n, entonces eres parte del precipitado. |
From: Jeff D. <da...@da...> - 2001-06-08 19:42:14
|
>And the database, if it does string matching, additionally (using LIKE, ~ >and whatnot). Another good point. (I suspect it will be quite awhile before PhpWiki supports unicode.) |
From: Steve W. <sw...@pa...> - 2001-06-08 17:19:30
|
On Fri, 8 Jun 2001, Jeff Dairiki wrote: > One cheap solution which might help a few people would be to add a > line containing the funky characters to the editpage.html form. > That way, people who can't figure out how to type in an accented > character could at least cut and paste the one they want. This is a neat idea... although most people who use accented/umlatted/etc chars probably have them on their keyboard. I had to use a French keyboard once and it blew my mind. ~swain --- http://www.panix.com/~swain/ "Without music to decorate it, time is just a bunch of boring production deadlines or dates by which bills must be paid." -- Frank Zappa |