Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#39 New line removed from text

open-remind
Lee Thomason
None
3
2007-07-09
2005-03-17
Lucian
No

For an XML file that has fields text with multilines
the parser returns text with spaces where new line
should be.

I think that SetCondenseWhiteSpaces(), even it is set
true or false, should had nothing to do with new-lines

Example XML file used:

<?xml version='1.0' encoding='UTF-8'?>
<XML>
<OP type='INSERT'>
<ITEM type='TASK'>
<FIELD name='Body'>First line
Second line
Another line</FIELD>
</ITEM>
</OP>
</XML>

>>Steps performed:

1. set TiXmlBase::SetCondenseWhiteSpace(false);
2. Not using STL
3. Run the code
4. Read field Body:
nodeName = (char *)tNode->Attribute("name");
data = (char *)tNode->FirstChild()->Value();

>>Expected result:

nodeName='Body'
data='First line
Second line
Another line'

Obtained result:
nodeName='Body'
data='First line Second line Another line'

Discussion

  • Lucian
    Lucian
    2005-03-21

    • assigned_to: nobody --> leethomason
     
  • A. Kasper
    A. Kasper
    2005-12-01

    Logged In: YES
    user_id=1394263

    I can confirm that bug. Special C-characters like "\n" or
    "\t" seem to be removed by tinyXML.
    The generated XML-string looks like this:
    <Tag>Testline1
    Testline2</Tag>
    After parsing with tinyXML it is:
    Testline1 Testline2

    Doesn't make much sense storing strings in an XML-file like
    this...

     
  • Lucian
    Lucian
    2005-12-01

    • priority: 5 --> 6
    • status: open --> open-remind
     
  • Leahn Novash
    Leahn Novash
    2007-04-05

    Logged In: YES
    user_id=1152950
    Originator: NO

    I have submitted to the author a small fix for this. Hopefully, he will add it to the next release.
    I didn't test it throughly, but everything seems to be fine and dandy.
    Since everyone seems to put the code fix here, what you need to do is.

    On tinyxml.h file add to the class TiXmlBase. In my file, it is around line 220ish. Seek for SetCondenseWhiteSpace and add it below.

    static void SetCondenseNewLine(bool condense) { condenseNewLine = condense; }
    static bool IsNewLineCondensed() { return condenseNewLine; }

    Still on the same class, but in the private area, in my file, it is around line 410ish. Seek for
    static bool condenseWhiteSpace; and add below it.

    static bool condenseNewLine;

    Last, but not least, on the IsWhiteSpace function replace the return with
    return ( isspace( (unsigned char) c ) || ((c == '\n' || c == '\r' ) && TiXmlBase::IsNewLineCondensed()));

    It did the trick for me, but as I said, I didn't test it throughly.

     
  • lujnan
    lujnan
    2007-04-12

    Logged In: YES
    user_id=1767833
    Originator: NO

    there is a flaw in TiXmlText object, i want to get the string in the node with "<root>&#x0a;</root>" , but could't achieve what i wishes, so
    i'm trying to debug and trace the source codes. i foud when i
    modified the souce codes in "tinyxmlparse.cpp" file line 1205 and commented the following
    org codes:
    1205 if ( !textNode->Blank() )
    1206 LinkEndChild( textNode );
    1207 else
    1208 delete textNode;
    modified to:
    1205 //if ( !textNode->Blank() )
    1206 LinkEndChild( textNode );
    1207 //else
    1208 // delete textNode;
    by this way i got the char '0x0a' successfully.
    is this a defect? is there any other ways for this problem.
    i expected the adiminer should fiexed this at next release.

    To Lucian - laruncutean:
    if you want got the result what you expected, you should write your stream as
    "<FIELD name='Body'>First line &#x0a;&#x0d;
    Second line
    Another line</FIELD>", and modifed the souce codes by what i said above.

     
  • lujnan
    lujnan
    2007-04-12

    Logged In: YES
    user_id=1767833
    Originator: NO

    there is a flaw in TiXmlText object, i want to get the string in the node with "<root>&#x0a;</root>" , but could't achieve what i wishes, so
    i'm trying to debug and trace the source codes. i foud when i
    modified the souce codes in "tinyxmlparse.cpp" file line 1205 and commented the following
    org codes:
    1205 if ( !textNode->Blank() )
    1206 LinkEndChild( textNode );
    1207 else
    1208 delete textNode;
    modified to:
    1205 //if ( !textNode->Blank() )
    1206 LinkEndChild( textNode );
    1207 //else
    1208 // delete textNode;
    by this way i got the char '0x0a' successfully.
    is this a defect? is there any other ways for this problem.
    i expected the adiminer should fiexed this at next release.

    To Lucian - laruncutean:
    if you want got the result what you expected, you should write your stream as
    "<FIELD name='Body'>First line &#x0a;&#x0d;
    Second line
    Another line</FIELD>", and modifed the souce codes by what i said above.

     
  • lujnan
    lujnan
    2007-04-12

    Logged In: YES
    user_id=1767833
    Originator: NO

    there is a flaw in TiXmlText object, i want to get the string in the node with "<root>&#x0a;</root>" , but could't achieve what i wishes, so
    i'm trying to debug and trace the source codes. i foud when i
    modified the souce codes in "tinyxmlparse.cpp" file line 1205 and commented the following
    org codes:
    1205 if ( !textNode->Blank() )
    1206 LinkEndChild( textNode );
    1207 else
    1208 delete textNode;
    modified to:
    1205 //if ( !textNode->Blank() )
    1206 LinkEndChild( textNode );
    1207 //else
    1208 // delete textNode;
    by this way i got the char '0x0a' successfully.
    is this a defect? is there any other ways for this problem.
    i expected the adiminer should fiexed this at next release.

    To Lucian - laruncutean:
    if you want got the result what you expected, you should write your stream as
    "<FIELD name='Body'>First line &#x0a;&#x0d;
    Second line
    Another line</FIELD>", and modifed the souce codes by what i said above.

     
  • lujnan
    lujnan
    2007-05-22

    Logged In: YES
    user_id=1767833
    Originator: NO

    maybe, we should use the '<![CDATA ]]>' tag. it is supported in tinyxml.

    <![CDATA[
    first line
    second line
    last line
    ]]>

     
  • Mike Kenny
    Mike Kenny
    2007-07-09

    Logged In: YES
    user_id=1840166
    Originator: NO

    This is actually more of a problem than you might first imagine. The current version of TinyXml 2.5.3 fails to retain #&0x0A; characters. When loading, editing and saving .vcproj files using TinyXml, certain important information, the splits between lines in a multi-line BuildEvent for instance, is lost. This breaks the execution of the vc project. The result is, TinyXml cannot be relied upon to read+write xml without loss of structure, which is a poor situation to be in, and a critical failure.

     
  • Lucian
    Lucian
    2007-07-09

    • priority: 6 --> 3