Hello Christian Dugast,

When a browser is able to render a page nicely it does not mean the source code of that page is valid. Browsers are built to allow a lot of invalid or not even well-formed markup.

The errors you observe have nothing to do with the declared DOCTYPE, the source code is just not well-formed, it is not XML (even though it might be declared being XML). 

• A tag (like <meta />) has to be closed to be well-formed.
• Certain characters (&, <, ', ") have to be written as entities (&amp; &lt; &apos; &quot;) in certain locations.
• JavaScript code must be in a CDATA section to be treated as text by the XML parser

As long as your source document is not XML you are out of luck starting an XML-based process. 

For me, this is the beauty of the XML standard: It enforces the standard and therefore makes following processes reliable.

- Michael


Am 27.12.2011 um 22:16 schrieb Dr. Christian Dugast:

Hello,
 
There is a public valid webpage from which I want to extract information using an XQuery.
I have saved the code of this webpage in a txt file and alternatively as an html document.
I have analysed the code and I have written accordingly my query to extract just the information I am looking for.
 
But before being able to make an Xquery, Kernow parses the  original code and comes with a series of error messages eventhough the code is valid as the webpage produces a nice page using my browser.
 
The errors I see show code written in HTML ... so It seems, the webpage I am looking at extracting information has been written in html *and* in Xhtml, but I am not sure this is the real problem.

[…]

Below a  list of error messages I get with the related code that generates these errors (errors I get with either both headers, the original or the simplified one)
 
Error message:
Line 15, Col 3 The element type "meta" must be terminated by the matching end-tag "</meta>".
<meta name="robots" content="noindex,follow">        ß this is the one faulty, sure, but my browser has no problem with this code

[…]

Error message:
Line 39, Col 72 The reference to entity "f" must end with the ';' delimiter.
<option  value="/recherche_antidot/recherche.php?s=&f=Vis&o=appreciation,DESC&acces_libre=1">Note (décroissante)</option>

[…]

Error message:
Line 721, Col 23 Element type "scr" must be followed by either attribute specifications, ">" or "/>".
<script language="JavaScript" type="text/javascript">
                                document.write('<scr'+'ipt id="jspub99029" language="JavaScript"

--
_______________________________________________________________
Michael Müller-Hillebrand: Dokumentations-Technologie
Adobe Certified Expert, FrameMaker
Lösungen und Training, FrameScript, XML/XSL, Unicode
Blog: http://cap-studio.de/ - Tel. +49 (9131) 28747