I encountered a problem when streaming from an istream into tinyxml. When doing something like : myistream >> myelement .
The xml data received from a server contains zero's once in a while.
First the 'StreamIn function' succesfully produces a tag (containing a zero), but then the 'Parse function' can't handle the cstring.
You should read the XML specification : http://www.w3.org/TR/2004/REC-xml-20040204/#charsets
You'll see that the only white spaces allowed are :
space,tab,cr & lf.
The whole XML philosophy tend to be very strict on the syntax, to avoid the mess produced by the softness in the HTML world.
I strongly support having a strict parser.
And, having a null character in a string means the end of the string in the C language.
You are left with two choices, I think :
either you fix this weird source that give you random zeroes, or you transform it before you parse it.
Yves
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK i tried reading some of the xml specification, i have some trouble reading this formal language so correct me if I'm wrong but..
The zero's i'm receiving occur between elements, so it like this: <ping />\0 <ping />\0 <ping />, considering this is an unparsed entity it may contain non-XML.
What i'm receiving is put into an istream its streambuffer, which may contain zero's, because it is not a string.
But then it is the streamin function of tinyxml which puts this zero's into the first index of the string tag.
Shouldn't the streamin function read the istream and skip all it is getting till it finds valid xml?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
One of the weird sources that sends and expects zeros is Macromedia Flash. The weird source I was using just made itself compatible with Flash.
Maybe getting tinyxml to accept zeros is easier then pointing macromedia on the XML specification.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I understand your frustration...but supporting text with interal nulls is just strange. A pre-processor (before tinyxml) that converted nulls to /n would be much more natural.
lee
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I encountered a problem when streaming from an istream into tinyxml. When doing something like : myistream >> myelement .
The xml data received from a server contains zero's once in a while.
First the 'StreamIn function' succesfully produces a tag (containing a zero), but then the 'Parse function' can't handle the cstring.
For now i solved the problem doing the following.
#ifdef TIXML_USE_STL
TIXML_ISTREAM & operator >> (TIXML_ISTREAM & in, TiXmlNode & base)
{
TIXML_STRING tag;
tag.reserve( 8 * 1000 );
base.StreamIn( &in, &tag );
char *tagc = (char*) tag.c_str();
for(unsigned int i=0;i<tag.size();i++) //strip zero's LL
{
if( tagc[i] == 0 )
tagc[i] = ' ';
}
base.Parse( tagc , 0 );
return in;
}
#endif
You shouldn't modify c_str() in-place like that. The fact that you had to cast it to (char*) should tell you that you're doing something wrong.
you're right about that, but that's not the point of my question. It' s more like:
"should the parser parse streams that contain zero's or should it return an error (as it does now)" ?
You should read the XML specification :
http://www.w3.org/TR/2004/REC-xml-20040204/#charsets
You'll see that the only white spaces allowed are :
space,tab,cr & lf.
The whole XML philosophy tend to be very strict on the syntax, to avoid the mess produced by the softness in the HTML world.
I strongly support having a strict parser.
And, having a null character in a string means the end of the string in the C language.
You are left with two choices, I think :
either you fix this weird source that give you random zeroes, or you transform it before you parse it.
Yves
OK i tried reading some of the xml specification, i have some trouble reading this formal language so correct me if I'm wrong but..
The zero's i'm receiving occur between elements, so it like this: <ping />\0 <ping />\0 <ping />, considering this is an unparsed entity it may contain non-XML.
What i'm receiving is put into an istream its streambuffer, which may contain zero's, because it is not a string.
But then it is the streamin function of tinyxml which puts this zero's into the first index of the string tag.
Shouldn't the streamin function read the istream and skip all it is getting till it finds valid xml?
One of the weird sources that sends and expects zeros is Macromedia Flash. The weird source I was using just made itself compatible with Flash.
Maybe getting tinyxml to accept zeros is easier then pointing macromedia on the XML specification.
I understand your frustration...but supporting text with interal nulls is just strange. A pre-processor (before tinyxml) that converted nulls to /n would be much more natural.
lee