From: SourceForge.net <no...@so...> - 2005-12-09 11:33:41
|
Bugs item #1377059, was opened at 2005-12-09 04:33 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719006&aid=1377059&group_id=130646 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: Current Status: Open Resolution: None Priority: 8 Submitted By: Stephen Deasey (sdeasey) Assigned to: Nobody/Anonymous (nobody) Summary: UTF8 input not validated Initial Comment: Here's an old bug report on the tDOM XML parser mailing list: http://groups.yahoo.com/group/tdom/message/1092 I think the problem is in form.c Ext2Utf(). This function is supposed to convert an external string of characters (such as from a form submission) to UTF-8 before they are passed to the Tcl core. If the encoding which is passed in (which comes from various configuration options) happens to be null, the function becomes a no-op, simply appending the data unchanged to the dstring. As it happens, the encoding can be null... Even if it wasn't, I don't think it is ever valid for this function to simply append the data unchecked. The idea is that it is converting from some encoding to UTF-8, and that if the input is already UTF-8 then no encoding needs to happen. But the input could be *invalid* UTF-8. What's needed is to convert from UTF-8 to UTF-8, or effectively to validate the characters. The guys on the tDOM list say that all sorts of bad things can happen when invalid UTF-8 gets into the core, and as this character data comes from the 'Net it's a bit concerning. There were some changes made to AOLserver 8 months ago to simplify the encoding subsystem, or at least the configuration of it. We should probably import those before changing anything else. ( http://cvs.sourceforge.net/viewcvs.py/aolserver/aolserver/ChangeLog?rev=1.321&view=markup ) We need to decide how best to fix the above mentioned problem with validation of UTF-8 data. The nstest_http proc has no notion of charsets etc. It might need to be changed to properly test the fix. There might be other places in the code which have simillar problems. I seem to remember a report recently about the AOLserver nsdb module crashing due to weird encoding issues, but I can't seem to find it now... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719006&aid=1377059&group_id=130646 |