Menu

#525 XML_ERROR_INVALID_TOKEN on a legal xml character

Feature Request
closed-duplicate
nobody
None
5
2022-01-10
2014-05-27
No

The attached code tries to use expat to parse the (wide) string L"\ufeff<?xml version=\"1.0\" encoding=\"UTF-16\"?><root><child\u2070></root>".

When it's run, it produces the output:
Expat returned error 4: not well-formed (invalid token)
At: °></root>

According to the XML specification for version 1.0 (5th edition), the character \u2070 (which is the superscript zero) is legal in tag names, so I think expat should accept it.
I think expat currently complies with the fourth version of the standard, or an even older version.

I see that conformance to XML 1.0 is one of your goals, which I think also includes the fifth version of the standard.

1 Attachments

Discussion

  • Karl Waclawek

    Karl Waclawek - 2014-05-28

    I just had a look at the 4th edition, and it looks like in that version this character is not listed as a legal name start character (if I have not misread it). So it is possible that Expat is outdated in this regard.

     
  • Sebastian Pipping

    • status: unread --> open
     
  • Sebastian Pipping

    • status: open --> closed-duplicate
     
  • Sebastian Pipping

    Closing as a duplicate of https://sourceforge.net/p/expat/bugs/292/ .

     

Log in to post a comment.

MongoDB Logo MongoDB