First. thanks for a great little parser. I have used TinyXml to remove the MSXML abomination from my project resulting in fewer problems and much cleaner, tidier code.
In doing so, I made a small change to the way elements are parsed which I believe may be worth integrating into the main code.
I found, even with TiXmlBase::condenseWhiteSpace set to false, I was losing whitespace. Normally I wouldn't care (and my project ignores this anyway) but my xml files are digitally signed and the original format is needed to calculate the digest when checking the signature.
The code I changed was in TiXmlElement::ReadValue. The original (v2.6.1) code did the following:
stored the current stream position (with whitespace)
skipped whitespace
checked for start of a new element ('<')
The check was performed on the 'skipped' stream position so any proceeding ws was lost at this point
My change means, ws is only skipped if TiXmlBase::IsWhiteSpaceCondensed() is set
I also had to change the TiXmlText::IsBlank function to take TiXmlBase::IsWhiteSpaceCondensed() into account (ie allow all whitespace text)
// Read in text and elements in any order.
//const char* pWithWhiteSpace = p;
if (TiXmlBase::IsWhiteSpaceCondensed()) p = SkipWhiteSpace( p, encoding );
while ( p && *p )
{
if ( *p != '<' )
{
// Take what we have, make a text element.
TiXmlText* textNode = new TiXmlText( "" );
if ( !textNode )
{
return 0;
}
//if ( TiXmlBase::IsWhiteSpaceCondensed() )
//{
p = textNode->Parse( p, data, encoding );
//}
//else
//{
// // Special case: we want to keep the white space
// // so that leading spaces aren't removed.
// p = textNode->Parse( pWithWhiteSpace, data, encoding );
//}
if ( !textNode->Blank() )
LinkEndChild( textNode );
else
delete textNode;
}
else
{
// We hit a '<'
// Have we hit a new element or an end tag? This could also be
// a TiXmlText in the "CDATA" style.
if ( StringEqual( p, "</", false, encoding ) )
{
return p;
}
else
{
TiXmlNode* node = Identify( p, encoding );
if ( node )
{
p = node->Parse( p, data, encoding );
LinkEndChild( node );
}
else
{
return 0;
}
}
}
//pWithWhiteSpace = p;
if (TiXmlBase::IsWhiteSpaceCondensed()) p = SkipWhiteSpace( p, encoding );
}
if ( !p )
{
if ( document ) document->SetError( TIXML_ERROR_READING_ELEMENT_VALUE, 0, 0, encoding );
}
return p;
}
bool TiXmlText::Blank() const
{
if (!TiXmlBase::IsWhiteSpaceCondensed()) return (value.length() == 0);
for ( unsigned i=0; i<value.length(); i++ )
if ( !IsWhiteSpace( value_ ) )
return false;
return true;
}
_
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Tricky - the patch actually switches the behavior more to what I'd like it to be: strip the leading and end space (if condensing on) but leaves the internal space. It also fixes what may be an outright whitespace bug.
Still working on this patch.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi
First. thanks for a great little parser. I have used TinyXml to remove the MSXML abomination from my project resulting in fewer problems and much cleaner, tidier code.
In doing so, I made a small change to the way elements are parsed which I believe may be worth integrating into the main code.
I found, even with TiXmlBase::condenseWhiteSpace set to false, I was losing whitespace. Normally I wouldn't care (and my project ignores this anyway) but my xml files are digitally signed and the original format is needed to calculate the digest when checking the signature.
The code I changed was in TiXmlElement::ReadValue. The original (v2.6.1) code did the following:
stored the current stream position (with whitespace)
skipped whitespace
checked for start of a new element ('<')
The check was performed on the 'skipped' stream position so any proceeding ws was lost at this point
My change means, ws is only skipped if TiXmlBase::IsWhiteSpaceCondensed() is set
I also had to change the TiXmlText::IsBlank function to take TiXmlBase::IsWhiteSpaceCondensed() into account (ie allow all whitespace text)
My code changes are below, I hope this is of use
Great job!
Rob
const char* TiXmlElement::ReadValue( const char* p, TiXmlParsingData* data, TiXmlEncoding encoding )
{
TiXmlDocument* document = GetDocument();
// Read in text and elements in any order.
//const char* pWithWhiteSpace = p;
if (TiXmlBase::IsWhiteSpaceCondensed()) p = SkipWhiteSpace( p, encoding );
while ( p && *p )
{
if ( *p != '<' )
{
// Take what we have, make a text element.
TiXmlText* textNode = new TiXmlText( "" );
if ( !textNode )
{
return 0;
}
//if ( TiXmlBase::IsWhiteSpaceCondensed() )
//{
p = textNode->Parse( p, data, encoding );
//}
//else
//{
// // Special case: we want to keep the white space
// // so that leading spaces aren't removed.
// p = textNode->Parse( pWithWhiteSpace, data, encoding );
//}
if ( !textNode->Blank() )
LinkEndChild( textNode );
else
delete textNode;
}
else
{
// We hit a '<'
// Have we hit a new element or an end tag? This could also be
// a TiXmlText in the "CDATA" style.
if ( StringEqual( p, "</", false, encoding ) )
{
return p;
}
else
{
TiXmlNode* node = Identify( p, encoding );
if ( node )
{
p = node->Parse( p, data, encoding );
LinkEndChild( node );
}
else
{
return 0;
}
}
}
//pWithWhiteSpace = p;
if (TiXmlBase::IsWhiteSpaceCondensed()) p = SkipWhiteSpace( p, encoding );
}
if ( !p )
{
if ( document ) document->SetError( TIXML_ERROR_READING_ELEMENT_VALUE, 0, 0, encoding );
}
return p;
}
bool TiXmlText::Blank() const
{
if (!TiXmlBase::IsWhiteSpaceCondensed()) return (value.length() == 0);
for ( unsigned i=0; i<value.length(); i++ )
if ( !IsWhiteSpace( value_ ) )
return false;
return true;
}
_
Tricky - the patch actually switches the behavior more to what I'd like it to be: strip the leading and end space (if condensing on) but leaves the internal space. It also fixes what may be an outright whitespace bug.
Still working on this patch.
Slightly easier suggestion
https://sourceforge.net/tracker/index.php?func=detail&aid=3085245&group_id=13559&atid=113559