Using TinyXML if I load this file: (--- lines not included)
------------------------------------------------------------------------------------------------------------------------
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML LANG=EN>
<HEAD>
<TITLE>some title</TITLE>
<LINK type="text/css" rel="stylesheet" href="http://link2.css">
</HEAD>
<BODY bgcolor="#FFFFFF">
<TABLE border="0" cellspacing="0" cellpadding="0" align="center" summary=" " >
<TR align=center valign=top bgcolor="#FFFFFF">
<TD colspan=2>
<a href="http://link.com" TARGET="_top">
<IMG src="http://link.com/pic.jpg" border="0">
</a>
</TD>
</TR>
</table>
<H1 align=center>heading</h1>
</BODY>
</HTML>
------------------------------------------------------------------------------------------------------------------------
And then print it I get:
------------------------------------------------------------------------------------------------------------------------
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML LANG="EN">
<HEAD>
<TITLE>some title</TITLE>
<LINK type="text/css" rel="stylesheet" href="http://link2.css" />
</HEAD>
</HTML>
------------------------------------------------------------------------------------------------------------------------
I do get this message when loading the file doc.ErrorDesc() = 'Error reading end tag.' and it seems to be caused by the <LINK> tag. As soon as I remove it I get:
------------------------------------------------------------------------------------------------------------------------
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML LANG="EN">
<HEAD>
<TITLE>some title</TITLE>
</HEAD>
<BODY bgcolor="#FFFFFF">
<TABLE border="0" cellspacing="0" cellpadding="0" align="center" summary=" ">
<TR align="center" valign="top" bgcolor="#FFFFFF">
<TD colspan="2">
<a href="http://link.com" TARGET="_top">
<IMG src="http://link.com/pic.jpg" border="0" />
</a>
</TD>
</TR>
</TABLE>
</BODY>
</HTML>
------------------------------------------------------------------------------------------------------------------------
Can anyone explain whats going on here and/or how to fix it?
Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1) This is a long tale, but in short the question is: in HTML some closing tags are OPTIONAL, in XML all closing tags are MANDATORY. So, your HTML is not a well-formed XML.
2) No, to my knowledge.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Using TinyXML if I load this file: (--- lines not included)
------------------------------------------------------------------------------------------------------------------------
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML LANG=EN>
<HEAD>
<TITLE>some title</TITLE>
<LINK type="text/css" rel="stylesheet" href="http://link2.css">
</HEAD>
<BODY bgcolor="#FFFFFF">
<TABLE border="0" cellspacing="0" cellpadding="0" align="center" summary=" " >
<TR align=center valign=top bgcolor="#FFFFFF">
<TD colspan=2>
<a href="http://link.com" TARGET="_top">
<IMG src="http://link.com/pic.jpg" border="0">
</a>
</TD>
</TR>
</table>
<H1 align=center>heading</h1>
</BODY>
</HTML>
------------------------------------------------------------------------------------------------------------------------
And then print it I get:
------------------------------------------------------------------------------------------------------------------------
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML LANG="EN">
<HEAD>
<TITLE>some title</TITLE>
<LINK type="text/css" rel="stylesheet" href="http://link2.css" />
</HEAD>
</HTML>
------------------------------------------------------------------------------------------------------------------------
I do get this message when loading the file doc.ErrorDesc() = 'Error reading end tag.' and it seems to be caused by the <LINK> tag. As soon as I remove it I get:
------------------------------------------------------------------------------------------------------------------------
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML LANG="EN">
<HEAD>
<TITLE>some title</TITLE>
</HEAD>
<BODY bgcolor="#FFFFFF">
<TABLE border="0" cellspacing="0" cellpadding="0" align="center" summary=" ">
<TR align="center" valign="top" bgcolor="#FFFFFF">
<TD colspan="2">
<a href="http://link.com" TARGET="_top">
<IMG src="http://link.com/pic.jpg" border="0" />
</a>
</TD>
</TR>
</TABLE>
</BODY>
</HTML>
------------------------------------------------------------------------------------------------------------------------
Can anyone explain whats going on here and/or how to fix it?
Thanks
HTML is not XML: the LINK tag lacks a closing tag (i.e. </LINK>).
I thought html was a subset of XML, is this not true?
Either way, is there a way to get this behavior out of TinyXML without modifying the HTML? (I don't have control over it)
Thanks
1) This is a long tale, but in short the question is: in HTML some closing tags are OPTIONAL, in XML all closing tags are MANDATORY. So, your HTML is not a well-formed XML.
2) No, to my knowledge.
Any recommendations for a HTML parser or a different XML parser with these capabilities?
Thanks