From: Michael Kay <mhk@mh...> - 2003-09-18 19:21:00
Try converting the HTML to XML using Dave Raggett's "tidy" utility, =
available from w3.org.
> -----Original Message-----
> From: saxon-help-admin@...=20
> [mailto:saxon-help-admin@...] On Behalf Of=20
> Phil Williams
> Sent: 18 September 2003 16:38
> To: saxon-help@...
> Subject: [saxon] using XQuery to transform HTML
> I am new to saxon and XQuery.
> I have been trying to transform HTML documents using the=20
> XQuery feature of Saxon. The first line of my HTML=20
> (myhtml.html) references the loose.dtd from W3C. i.e.
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
> I have written a basic query (in findFormElements.xq) to pull=20
> out the info inside a form tag as:
> for $elem in input()//form
> return $elem
> The saxon command I use is
> java net.sf.saxon.Query -s myhtml.html findFormElements.xq
> I then get the following error:
> Error on line 31 of http://www.w3.org/TR/html4/loose.dtd:
> Error reported by XML parser: Next character must be ">"=20
> terminating <!ENTITY ...> declaration "%HTML.Version". Query=20
> processing failed
> From what I can find on the net, this is due to the fact that=20
> the DTD is an SGML DTD and not an XML DTD.
> The questions I have for you all are:
> 1) Is that correct?
> 2) If so, is there some way to work around this problem=20
> without having to manually convert the loose.dtd to an XML DTD?
> 3) If I have to convert it, has anyone else done this task already?
> Thanks in advance for your assistance.
> Nu=19Yz=EE=A6=96~zi=DB=B3 l=CB=B2q z l X)=DF=A3=1A=20