Menu

#50 UTF8 input not validated

Bug
closed-fixed
nobody
8
2023-11-06
2005-12-09
Stephen
No

Here's an old bug report on the tDOM XML parser mailing
list:

http://groups.yahoo.com/group/tdom/message/1092

I think the problem is in form.c Ext2Utf(). This
function is supposed to convert an external string of
characters (such as from a form submission) to UTF-8
before they are passed to the Tcl core.

If the encoding which is passed in (which comes from
various configuration options) happens to be null, the
function becomes a no-op, simply appending the data
unchanged to the dstring.

As it happens, the encoding can be null...

Even if it wasn't, I don't think it is ever valid for
this function to simply append the data unchecked. The
idea is that it is converting from some encoding to
UTF-8, and that if the input is already UTF-8 then no
encoding needs to happen.

But the input could be *invalid* UTF-8. What's needed
is to convert from UTF-8 to UTF-8, or effectively to
validate the characters.

The guys on the tDOM list say that all sorts of bad
things can happen when invalid UTF-8 gets into the
core, and as this character data comes from the 'Net
it's a bit concerning.

There were some changes made to AOLserver 8 months ago
to simplify the encoding subsystem, or at least the
configuration of it. We should probably import those
before changing anything else. (
http://cvs.sourceforge.net/viewcvs.py/aolserver/aolserver/ChangeLog?rev=1.321&view=markup
)

We need to decide how best to fix the above mentioned
problem with validation of UTF-8 data.

The nstest_http proc has no notion of charsets etc. It
might need to be changed to properly test the fix.

There might be other places in the code which have
simillar problems. I seem to remember a report
recently about the AOLserver nsdb module crashing due
to weird encoding issues, but I can't seem to find it
now...

Discussion

  • Stephen

    Stephen - 2006-12-01
    • labels: --> NaviServer - libnsd, libnsthread, nsd
     
  • Stephen

    Stephen - 2006-12-01
    • milestone: 469714 --> 473033
     
  • Stephen

    Stephen - 2006-12-01
    • milestone: 473033 --> Bug
     
  • gustafn

    gustafn - 2023-11-06

    There was a large reform for handling UTF-8 in NaviServer - and in Tcl as well. Full support for UTF-8 from Tcl will be introduced by the forthcoming Tcl9. NaviServer performs peroper conversions where ever necessary (e.g. in the DB drivers). Furthermore, there is a validation support in NaviServer on the Tcl level "ns_valid_utf8" and on the C API level "Ns_Valid_UTF8()".

     
  • gustafn

    gustafn - 2023-11-06
    • labels: NaviServer - libnsd, libnsthread, nsd --> NaviServer - libnsd, libnsthread, nsd
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,3 @@
    -
    
     Here's an old bug report on the tDOM XML parser mailing
     list:
    
    • status: open --> closed-fixed
     

Log in to post a comment.