From: Alexandros V. <av...@no...> - 2005-04-18 15:54:15
|
Are there generic functions to decode HTML entities in Squirrelmail? I would appreciate some fingers pointing to some directions. :) The problem: html_mail plugin, when trying to create the plain text part, fails in decoding the entities from the entered greek text. fckeditor creates the entities such as &lambda ; , and following that, the decoding, as done by html_mail : function my_html_entity_decode($text) { if (function_exists('html_entity_decode')) return html_entity_decode($text); <snip> fails, because html_entity_decode() does not support iso-8859-7, and the plain text part ends up looking like this: ------=_20050418184548_33186 Content-Type: text/plain; charset="iso-8859-7" Content-Transfer-Encoding: 8bit δοκιμ? με ελληνικ?! τραλαλ? τραλαλ?! which is, of course, wrong. Actually I see two solutions: 1) Preventing fckeditor from entering entities in the first place. Someone has to dig deep into fckeditor's code to find out if and how this can be done. 2) Decoding the entities by ourselves and fixing function my_html_entity_decode(). Are the functions in functions/decode/* of any use in this situation? TIA, -- Alexandros Vellis University of Athens av...@no... Network Operations Centre http://www.noc.uoa.gr/~avel/ |
From: Tomas K. <to...@us...> - 2005-04-18 16:09:24
|
> Are there generic functions to decode HTML entities in Squirrelmail? I > would appreciate some fingers pointing to some directions. :) > > The problem: html_mail plugin, when trying to create the plain text > part, fails in decoding the entities from the entered greek text. > fckeditor creates the entities such as &lambda ; , and following that, > the decoding, as done by html_mail : > > function my_html_entity_decode($text) { > if (function_exists('html_entity_decode')) > return html_entity_decode($text); > <snip> > > fails, because html_entity_decode() does not support iso-8859-7, and the > plain text part ends up looking like this: > > > ------=_20050418184548_33186 > Content-Type: text/plain; charset="iso-8859-7" > Content-Transfer-Encoding: 8bit > > > δοκιμ? με > ελληνικ?! > τραλαλ? > τραλαλ?! > > > which is, of course, wrong. > > Actually I see two solutions: > > 1) Preventing fckeditor from entering entities in the first place. > Someone has to dig deep into fckeditor's code to find out if and how > this can be done. > > 2) Decoding the entities by ourselves and fixing function > my_html_entity_decode(). > > Are the functions in functions/decode/* of any use in this situation? decoding functions deal with conversion from 8bit to &#number; encoding functions by default support only &#number;. &#xhex is implemented but not enabled. encoding from named html entities is not implemented. it can be implemented with simple mapping array. -- Tomas |
From: Alexandros V. <av...@no...> - 2005-04-19 12:17:28
|
On Mon, 2005-04-18 at 19:11 +0200, Tomas Kuliavas wrote: > <snip> > > Actually I see two solutions: > > > > 1) Preventing fckeditor from entering entities in the first place. > > Someone has to dig deep into fckeditor's code to find out if and how > > this can be done. > > > > 2) Decoding the entities by ourselves and fixing function > > my_html_entity_decode(). > > > > Are the functions in functions/decode/* of any use in this situation? > > decoding functions deal with conversion from 8bit to &#number; > > encoding functions by default support only &#number;. &#xhex is > implemented but not enabled. encoding from named html entities is not > implemented. it can be implemented with simple mapping array. > Thanks for explaining. I found that in my *particular* case, fckeditor is to blame, because: http://www.w3.org/TR/REC-html40/sgml/entities.html#h-24.3 "When to use Greek entities. This entity set contains all the letters used in modern Greek. However, it does not include Greek punctuation, precomposed accented characters nor the non-spacing accents (tonos, dialytika) required to compose them. (snip) The entities defined here are not intended for the representation of modern Greek text and would not be an efficient representation; rather, they are intended for occasional Greek letters used in technical and mathematical works." So, my workaround was to comment-out all greek entities in these files: editor/_source/internals/fckxhtmlentities.js editor/js/fckeditorcode_gecko_1.js editor/js/fckeditorcode_ie_1.js I left the code in my_html_entity_decode() as is, because html_entity_decode() can actually catch some of the known entities used in all languages and in ISO-8859-1. However, probably a simple mapping array, as you suggest, *could* be useful for other exotic languages + entities with the same problem. I'll open a bug report and find a proper solution with the fckeditor people themselves. FWIW, I've seen Microsoft Word HTML output that contains greek HTML entities in pure greek text, in contradiction with the W3C directive above. Eeeek. :-/ -- Alexandros Vellis University of Athens av...@no... Network Operations Centre http://www.noc.uoa.gr/~avel/ |
From: Paul L. <pa...@sq...> - 2005-04-21 03:15:58
|
Alexandros, Nice sleuthing. So does this mean you think we should release HTML_Mail as is, or do you want to throw in our own custom decode table with *ONLY* Greek symbols before the call to html_entity_decode()? Cheers, Paul > On Mon, 2005-04-18 at 19:11 +0200, Tomas Kuliavas wrote: > > >><snip> > > >>>Actually I see two solutions: >>> >>>1) Preventing fckeditor from entering entities in the first place. >>>Someone has to dig deep into fckeditor's code to find out if and how >>>this can be done. >>> >>>2) Decoding the entities by ourselves and fixing function >>>my_html_entity_decode(). >>> >>>Are the functions in functions/decode/* of any use in this situation? >> >>decoding functions deal with conversion from 8bit to &#number; >> >>encoding functions by default support only &#number;. &#xhex is >>implemented but not enabled. encoding from named html entities is not >>implemented. it can be implemented with simple mapping array. >> > > > Thanks for explaining. > > I found that in my *particular* case, fckeditor is to blame, because: > > http://www.w3.org/TR/REC-html40/sgml/entities.html#h-24.3 > > "When to use Greek entities. This entity set contains all the letters > used in modern Greek. However, it does not include Greek punctuation, > precomposed accented characters nor the non-spacing accents (tonos, > dialytika) required to compose them. (snip) The entities defined here > are not intended for the representation of modern Greek text and would > not be an efficient representation; rather, they are intended for > occasional Greek letters used in technical and mathematical works." > > So, my workaround was to comment-out all greek entities in these files: > > editor/_source/internals/fckxhtmlentities.js > editor/js/fckeditorcode_gecko_1.js > editor/js/fckeditorcode_ie_1.js > > I left the code in my_html_entity_decode() as is, because > html_entity_decode() can actually catch some of the known entities used > in all languages and in ISO-8859-1. > > However, probably a simple mapping array, as you suggest, *could* be > useful for other exotic languages + entities with the same problem. > > I'll open a bug report and find a proper solution with the fckeditor > people themselves. > > FWIW, I've seen Microsoft Word HTML output that contains greek HTML > entities in pure greek text, in contradiction with the W3C directive > above. Eeeek. :-/ > > -- Open Guild, LLC http://openguild.net/ Software.Systems.Solutions |
From: Tomas K. <to...@us...> - 2005-04-21 09:29:41
|
> Alexandros, > > Nice sleuthing. So does this mean you think we should release > HTML_Mail as is, or do you want to throw in our own custom decode table > with *ONLY* Greek symbols before the call to html_entity_decode()? I think it is not specific to Greek. SquirrelMail 1.4.4 utf-8 translation html_mail 2.1 (fckeditor). Used sample config and changed only $customStyle to empty string. Original message composed in plain text. It contains html entities from http://www.topolis.lt/docs/sm/squirrelmail/strings/_functions_htmlentities_readme_php.html Forwarded message composed in html_mail. Letters are incorrectly encoded in text/plain. Second forwarded message composed in iso-8859-1 translation with $lossy_encoding enabled. I haven't looked deeper in code, but I think converter uses only html->iso-8859-1 mapping and does not care about charset that it adds to text/plain part. -- Tomas |
From: Alexandros V. <av...@no...> - 2005-05-10 14:44:55
|
Oops this thread got lost in my pile... On Thu, 21 Apr 2005 12:31:28 +0200 (EET) "Tomas Kuliavas" <to...@us...> wrote: > I think it is not specific to Greek. Indeed it is not. I had opened this bug report to fckeditor: https://sourceforge.net/tracker/?func=detail&atid=543653&aid=1185905&group_id=75348 and someone else reported the same for Polish: https://sourceforge.net/tracker/?func=detail&atid=543653&aid=1189442&group_id=75348 I'd rather see this fixed in fckeditor. Then we will not need other encoding functions that support every HTML entity of the galaxy and eat our precious CPU. :-) Alexandros |
From: Paul L. <pa...@sq...> - 2005-05-10 19:18:47
|
Alexandros Vellis wrote: > Oops this thread got lost in my pile... > > On Thu, 21 Apr 2005 12:31:28 +0200 (EET) > "Tomas Kuliavas" <to...@us...> wrote: > > >>I think it is not specific to Greek. > > > Indeed it is not. > > I had opened this bug report to fckeditor: > > https://sourceforge.net/tracker/?func=detail&atid=543653&aid=1185905&group_id=75348 > > and someone else reported the same for Polish: > > https://sourceforge.net/tracker/?func=detail&atid=543653&aid=1189442&group_id=75348 > > I'd rather see this fixed in fckeditor. Then we will not need other > encoding functions that support every HTML entity of the galaxy and eat > our precious CPU. :-) OK, that's kind of what I was hoping. I'll release HTML Mail as is by the weekend I hope. Thanks much. -paul -- |
From: Alexandros V. <av...@no...> - 2005-05-11 09:16:28
|
> OK, that's kind of what I was hoping. I'll release HTML Mail as is by > the weekend I hope. Thanks much. Hey Paul, fckeditor 2.0FC just got released, with my idea in: [SF BUG-1189442] [SF BUG-1187164] [SF BUG-1185905] It is now possible to configure the editor to not convert Greek or special Latin letters to ther specific HTML entities. You can also configure it to not convert any character at all. Take a look at the "ProcessHTMLEntities", "IncludeLatinEntities" and "IncludeGreekEntities" configuration options. I will try to see if they are adequate for our needs. Cheers, Alexandros |
From: Paul L. <pa...@sq...> - 2005-05-12 20:29:17
|
Alexandros Vellis wrote: >>OK, that's kind of what I was hoping. I'll release HTML Mail as is by >>the weekend I hope. Thanks much. > > > Hey Paul, > > fckeditor 2.0FC just got released, with my idea in: ugh. :) > [SF BUG-1189442] [SF BUG-1187164] [SF BUG-1185905] It is now possible to > configure the editor to not convert Greek or special Latin letters to > ther specific HTML entities. You can also configure it to not convert > any character at all. Take a look at the "ProcessHTMLEntities", > "IncludeLatinEntities" and "IncludeGreekEntities" configuration > options. > > I will try to see if they are adequate for our needs. OK, let me know (especially in regard to any settings I should make in FCKeditor config file). In the meantime, I will pull 2.0FC into the HTML_Mail code. Thanks! 0- paul |