Menu

#21 Normalize all newlines on input

open
nobody
None
5
2005-05-17
2005-05-17
marlonism
No

According to XML specs, all newlines should be
normalized to "\n" during input processing. The following
are possible manifestations of newlines:

Mac - "\r"
DOS\Windows - "\r\n"
Unix - "\n"
IBM - 0x85 or 0x0D+0x85 (XML 1.1 RFC)

Current TinyXML version 2.3.4 uses fopen(..., "r") to read
XML files. In this mode, fgets() automatically
converts "\r\n" into "\n" in Windows. In UNIX, "\r\n" will
still be read as "\r\n".

Therefore, relying on fopen() to normalize newlines will
not work in the following cases:
- Mac-style newlines
- gcc on Unix (if XML came from Windows)
- when TiXmlDocument.Parse() is called directly to
parse in-memory XML stream

I would like to submit a patch so that the above cases
will work (I won't include IBM NEL since I'm not clear on
whether XML 1.1 is supported).

It involves changing GetChar() implementation in ver
2.3.4 of tinyxml.h from:

inline static const char* GetChar( const char* p, char*
_value, int* length, TiXmlEncoding encoding )
{
...
if ( *length == 1 )
{
if ( *p == '&' )
return GetEntity( p, _value, length, encoding );
*_value = *p;
return p+1;
}
...
}

to:

inline static const char* GetChar( const char* p, char*
_value, int* length, TiXmlEncoding encoding )
{
...
if ( *length == 1 )
{
if ( *p == '&' )
return GetEntity( p, _value, length, encoding );
if (*p == '\r')
{
*_value = '\n';
return (*(p+1) == '\n') ? (p+2) : (p+1);
}
*_value = *p;
return p+1;
}
...
}

Thanks!
-- Marlon Baculio

Discussion


Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.