Hello Christian Dugast,
When a browser is able to render a page nicely it does not mean the source code of that page is valid. Browsers are built to allow a lot of invalid or not even well-formed markup.
The errors you observe have nothing to do with the declared DOCTYPE, the source code is just not well-formed, it is not XML (even though it might be declared being XML).
• A tag (like <meta />) has to be closed to be well-formed.
• Certain characters (&, <, ', ") have to be written as entities (& < ' ") in certain locations.
As long as your source document is not XML you are out of luck starting an XML-based process.
For me, this is the beauty of the XML standard: It enforces the standard and therefore makes following processes reliable.
Am 27.12.2011 um 22:16 schrieb Dr. Christian Dugast:
There is a public valid webpage from which I want to extract information using an XQuery.
I have saved the code of this webpage in a txt file and alternatively as an html document.
I have analysed the code and I have written accordingly my query to extract just the information I am looking for.
But before being able to make an Xquery, Kernow parses the original code and comes with a series of error messages eventhough the code is valid as the webpage produces a nice page using my browser.
The errors I see show code written in HTML ... so It seems, the webpage I am looking at extracting information has been written in html *and* in Xhtml, but I am not sure this is the real problem.
<meta name="robots" content="noindex,follow"> ß this is the one faulty, sure, but my browser has no problem with this code
Below a list of error messages I get with the related code that generates these errors (errors I get with either both headers, the original or the simplified one)
Line 15, Col 3 The element type "meta" must be terminated by the matching end-tag "</meta>".
<option value="/recherche_antidot/recherche.php?s=&f=Vis&o=appreciation,DESC&acces_libre=1">Note (décroissante)</option>
Line 39, Col 72 The reference to entity "f" must end with the ';' delimiter.
Line 721, Col 23 Element type "scr" must be followed by either attribute specifications, ">" or "/>".
Michael Müller-Hillebrand: Dokumentations-Technologie
Adobe Certified Expert, FrameMaker
Lösungen und Training, FrameScript, XML/XSL, Unicode