Whitespace in elements - suggested change/fix

Developer
Rob Rimmer
2010-04-29
2013-05-20
  • Rob Rimmer

    Rob Rimmer - 2010-04-29

    Hi

    First. thanks for a great little parser.  I have used TinyXml to remove the MSXML abomination from my project resulting in fewer problems and much cleaner, tidier code.
    In doing so, I made a small change to the way elements are parsed which I believe may be worth integrating into the main code.

    I found, even with TiXmlBase::condenseWhiteSpace set to false, I was losing whitespace.  Normally I wouldn't care (and my project ignores this anyway) but my xml files are digitally signed and the original format is needed to calculate the digest when checking the signature.
    The code I changed was in TiXmlElement::ReadValue.  The original (v2.6.1) code did the following:

    stored the current stream position (with whitespace)
    skipped whitespace
    checked for start of a new element ('<')

    The check was performed on the 'skipped' stream position so any proceeding ws was lost at this point

    My change means, ws is only skipped if TiXmlBase::IsWhiteSpaceCondensed() is set

    I also had to change the TiXmlText::IsBlank function to take TiXmlBase::IsWhiteSpaceCondensed() into account (ie allow all whitespace text)

    My code changes are below, I hope this is of use

    Great job!

    Rob

    const char* TiXmlElement::ReadValue( const char* p, TiXmlParsingData* data, TiXmlEncoding encoding )
    {
    TiXmlDocument* document = GetDocument();

    // Read in text and elements in any order.

    //const char* pWithWhiteSpace = p;
    if (TiXmlBase::IsWhiteSpaceCondensed()) p = SkipWhiteSpace( p, encoding );

    while ( p && *p )
    {
    if ( *p != '<' )
    {
    // Take what we have, make a text element.
    TiXmlText* textNode = new TiXmlText( "" );

    if ( !textNode )
    {
        return 0;
    }


    //if ( TiXmlBase::IsWhiteSpaceCondensed() )
    //{
    p = textNode->Parse( p, data, encoding );
    //}
    //else
    //{
    // // Special case: we want to keep the white space
    // // so that leading spaces aren't removed.
    // p = textNode->Parse( pWithWhiteSpace, data, encoding );
    //}

    if ( !textNode->Blank() )
    LinkEndChild( textNode );
    else
    delete textNode;
    }
    else
    {
    // We hit a '<'
    // Have we hit a new element or an end tag? This could also be
    // a TiXmlText in the "CDATA" style.
    if ( StringEqual( p, "</", false, encoding ) )
    {
    return p;
    }
    else
    {
    TiXmlNode* node = Identify( p, encoding );
    if ( node )
    {
    p = node->Parse( p, data, encoding );
    LinkEndChild( node );
    }
    else
    {
    return 0;
    }
    }
    }

    //pWithWhiteSpace = p;
    if (TiXmlBase::IsWhiteSpaceCondensed()) p = SkipWhiteSpace( p, encoding );

    }

    if ( !p )
    {
    if ( document ) document->SetError( TIXML_ERROR_READING_ELEMENT_VALUE, 0, 0, encoding );
    }
    return p;
    }

    bool TiXmlText::Blank() const
    {

    if (!TiXmlBase::IsWhiteSpaceCondensed()) return (value.length() == 0);

    for ( unsigned i=0; i<value.length(); i++ )
    if ( !IsWhiteSpace( value_ ) )
    return false;
    return true;
    }

    _

     
  • Lee Thomason

    Lee Thomason - 2010-06-05

    Tricky - the patch actually switches the behavior more to what I'd like it to be: strip the leading and end space (if condensing on) but leaves the internal space. It also fixes what may be an outright whitespace bug.

    Still working on this patch.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks