Menu

#14 No escaping

v4.2.1
closed-fixed
None
5
2016-02-13
2016-02-05
No

OK, it can be argued wether this is a bug or a feature but here we go ;-)

In my opinion, sxmlc should take care of escaping the five chars that are required. It's not really practical to leave that to the application layer.

Here's a description

http://stackoverflow.com/questions/1091945/what-characters-do-i-need-to-escape-in-xml-documents

Discussion

  • Matthieu Labas

    Matthieu Labas - 2016-02-06

    Well, I must say the desgin of sxmlc was to make XML parsing/writing simple and let applications decide what they want to put in the different placeholders (text, attributes, ...), even if it meant "breaking" some XML contracts (having several root nodes, illegal/unescaped characters in texts or attributes, ...).

    You can already use the str2html()/html2str() function to escape/unescape these characters.

    That said, the link you gave is indeed very interesting, especially regarding the characters allowed in text or attrinbutes so I'll have to check wether sxmlc is able to parse these special cases (I think it should).

    I'm preparing v4.2.1 to include special character ' that I forgot in the list of escapable characters.

     
  • Matthieu Labas

    Matthieu Labas - 2016-02-06

    Ticket moved from /p/sxmlc/bugs/13/

     
  • Nicholai Benalal

    Hi,

    To me, this is not so much about adhering strictly to XML standards but to make sure that what is written can also be read back properly. The chars in the link I sent are those which are likely to break that paradigm. That's why I still think it would be better if sxmlc did this excaping automatically. Your pick of course :-)

    Nicholai

     
  • Matthieu Labas

    Matthieu Labas - 2016-02-09

    Hi,

    As it turns out, sxmlc is not able to parse the strings they give in the link. Which is a bug so I'll move it back to the bugs sections.

    Most of the time, you know if what you're writing has a chance of containing those escapable characters so I thought the caller would be performing the escaping when he knows it might be needed. That said, you have a point when you say you should be able to read back what you wrote...

    First I will modify the str2html() function so you can pass NULL as a second argument to make it allocate and return the escaped string, so it will be easier to wrap it into the code. Note, though, that you would still have to free() it at some point, as it is mallocing it, which kind of breaks all the advantage...
    So I will add an inplace_escape() function that will perform the escaping in place of the string (it will be modified) For allocated strings, they should make sure to reserve some space for the expansion. For static strings (e.g. "ab > cd") I will provide a macro that does allocate 50 extra chars, forcing the original one to \0. Something like ESC(a) a"\0 ". A little brute force but should work. Using this function will be dangerous as there is no way for sxmlc to know whether a pointer is writable, so it might lead to core dumps...

    What do you think about it?

    Matthieu

     
  • Nicholai Benalal

    Hi,

    I'm not so sure about that solution. It sounds both complicated and somewhat limited from a design point of view.

    Wouldn't it be quite easy to do the following:

    1) In XMLDoc_print() and e.g. XMLDoc_parse_file_DOM() you add an extra argument to specify if escaping should be done automatically. There are possibly some (rare) cases where you might want to turn this off.

    2) When you write the xml document, you allocate temporary bufffers for the strings. The sizes of these can be calculated in advance by pre-parsing the string or by allocating the maximum size when each char in the string would be escaped. You could probably also do the extension of the buffer in a loop. Once you have printed the file/written to disk, you free the temporary buffer. Also, in XMLDoc_parse_file_DOM(), it should be possible to do this operation easily enough with the unescaped string ending up in the xml document structure and freed together with it, in the usual way.

    Am I missing something?

    Best,

    Nicholai

     
    • Matthieu Labas

      Matthieu Labas - 2016-02-11

      Yes I was not happy with that solution. As you noticed, it would be cumbersome and pretty useless in fact. Especially static strings can be escaped by the user instead.

      Unescaping can be done in-place. Escaping needs extra mallocs but if they're done within sxmlc, it can take care of the memory operations.

      I will check how to incorporate that properly and add some macros for backward compatibility.

      Thank you for your patience ;)

      Matthieu

       
      • Nicholai Benalal

        That sounds good :)

         

        Last edit: Matthieu Labas 2016-02-12
  • Matthieu Labas

    Matthieu Labas - 2016-02-12

    I have checked the behavior but unescaping is already performed on attribute values and escaping is performed when printing a node on text and attribute values.

    If I'm trying to parse the XML <valid att1=">" att2="'" att3='"'/> (which is valid, as per the SO answer):

    XMLNode node;
    XMLNode_init(&node);
    XML_parse_1string(C2SX("<valid att1=\">\" att2=\"'\" att3='\"'/>"), &node);
    XMLNode_print(&node, stdout, NULL, NULL, 1, 0, 0);
    

    gives me the output:

    <valid att1="&gt;" att2="&apos;" att3="&quot;"/>
    

    It can be argued that it actually does not have to escape them in that case (as they are valid in attribute values), but it does perform escaping.

    Did you notice a problem on escaping not being performed somewhere, or was it a thought you had (which is also fine)?

    In any case, I had to correct a few things for sxmlc to parse correctly such things...

     
  • Matthieu Labas

    Matthieu Labas - 2016-02-12

    Ticket moved from /p/sxmlc/feature-requests/7/

     
  • Matthieu Labas

    Matthieu Labas - 2016-02-13
    • status: open --> closed-fixed
    • Group: v4.2.0 --> v4.2.1
     
  • Matthieu Labas

    Matthieu Labas - 2016-02-13

    Feel free to reopen it if the issue is not yet fixed.

     

Log in to post a comment.