Menu

parsing "\0"

lodewijk
2004-03-18
2004-04-02
  • lodewijk

    lodewijk - 2004-03-18

    I encountered a problem when streaming from an istream into tinyxml. When doing something like : myistream >> myelement .
    The xml data received from a server contains zero's once in a while.
    First the 'StreamIn function' succesfully produces a tag (containing a zero), but then the 'Parse function' can't handle the cstring.

    For now i solved the problem doing the following.

    #ifdef TIXML_USE_STL   
    TIXML_ISTREAM & operator >> (TIXML_ISTREAM & in, TiXmlNode & base)
    {
        TIXML_STRING tag;
        tag.reserve( 8 * 1000 );
        base.StreamIn( &in, &tag );
       
        char *tagc = (char*) tag.c_str();
       
        for(unsigned int i=0;i<tag.size();i++)  //strip zero's LL
        {
             if( tagc[i] == 0 )
                tagc[i] = ' ';
        }

        base.Parse( tagc , 0 );
        return in;
    }
    #endif

     
    • B Sizer

      B Sizer - 2004-03-18

      You shouldn't modify c_str() in-place like that. The fact that you had to cast it to (char*) should tell you that you're doing something wrong.

       
    • lodewijk

      lodewijk - 2004-03-18

      you're right about that, but that's not the point of my question. It' s more like:

      "should the parser parse streams that contain zero's or should it return an error (as it does now)" ?

       
      • Yves Berquin

        Yves Berquin - 2004-03-19

        You should read the XML specification :
        http://www.w3.org/TR/2004/REC-xml-20040204/#charsets
        You'll see that the only white spaces allowed are :
        space,tab,cr & lf.
        The whole XML philosophy tend to be very strict on the syntax, to avoid the mess produced by the softness in the HTML world.
        I strongly support having a strict parser.
        And, having a null character in a string means the end of the string in the C language.
        You are left with two choices, I think :
        either you fix this weird source that give you random zeroes, or you transform it before you parse it.
        Yves

         
    • lodewijk

      lodewijk - 2004-03-19

      OK i tried reading some of the xml specification, i have some trouble reading this formal language so correct me if I'm wrong but..

      The zero's i'm receiving occur between elements, so it like this: <ping />\0 <ping />\0 <ping />, considering this is an unparsed entity it may contain non-XML.

      What i'm receiving is put into an istream its streambuffer, which may contain zero's, because it is not a string.
      But then it is the streamin function of tinyxml which puts this zero's into the first index of the string tag.

      Shouldn't the streamin function read the istream and skip all it is getting till it finds valid xml?

       
    • lodewijk

      lodewijk - 2004-04-01

      One of the weird sources that sends and expects zeros is Macromedia Flash. The weird source I was using just made itself compatible with Flash.
      Maybe getting tinyxml to accept zeros is easier then pointing macromedia on the XML specification.

       
    • Lee Thomason

      Lee Thomason - 2004-04-02

      I understand your frustration...but supporting text with interal nulls is just strange. A pre-processor (before tinyxml) that converted nulls to /n would be much more natural.

      lee

       

Log in to post a comment.

MongoDB Logo MongoDB