From: Stephen G. W. <sg...@no...> - 2008-07-28 20:46:48
|
Is there some incompatibility between NekoHTML and XPath as implemented by Xalan? I have tried several different methods of getting XPath expressions to work on NekoHTML produced documents and am having no luck. I can traverse the generated DOM tree, but XPATH expressions never produce any results. I have tried using both a compiled XPathExpression as well as an XPathEvaluator with no luck. I am using NekoHTML 1.9.8 and Xalan-J 2.7.1. Thank You, ----------------------------------------------------------- - stephen.g.walizer - http://node777.net - sg...@no... ----------------------------------------------------------- |
From: Jacob K. <ho...@vi...> - 2008-07-28 23:13:49
|
It would help if you provide an example document, an XPath expression, and the node you expect it to match. Jake Stephen G. Walizer wrote: > Is there some incompatibility between NekoHTML and XPath as > implemented by Xalan? I have tried several different methods of > getting XPath expressions to work on NekoHTML produced documents and > am having no luck. I can traverse the generated DOM tree, but XPATH > expressions never produce any results. > > I have tried using both a compiled XPathExpression as well as an > XPathEvaluator with no luck. > > I am using NekoHTML 1.9.8 and Xalan-J 2.7.1. > > Thank You, > ----------------------------------------------------------- > - stephen.g.walizer - http://node777.net - sg...@no... > ----------------------------------------------------------- |
From: Stephen G. W. <sg...@no...> - 2008-07-29 00:05:39
|
I'll even include the code I'm testing with. import javax.xml.xpath.XPath; import javax.xml.xpath.XPathConstants; import javax.xml.xpath.XPathExpression; import javax.xml.xpath.XPathFactory; import org.cyberneko.html.parsers.DOMParser; import org.w3c.dom.Document; import org.w3c.dom.NodeList; public class Test4 { public static void main(String[] args) { try { XPathFactory xpFactory = XPathFactory.newInstance(); XPath xpath = xpFactory.newXPath(); String expression = "//title"; XPathExpression xpathExpression = xpath.compile(expression); DOMParser parser = new DOMParser(); parser.setFeature("http://xml.org/sax/features/namespaces", false); parser.parse("./test2.html"); Document doc = parser.getDocument(); Object result = xpathExpression.evaluate(doc, XPathConstants.NODESET); NodeList nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) { System.out.println(nodes.item(i).getNodeValue()); } } catch(Exception e) { e.printStackTrace(); } } } Test HTML <html> <head> <title>Test Page</title> </head> </body> <p>Foo</p> </body> </html> Returns an empty node set regardless of what I use for expression. I was originally using a more complex HTML but figured I'd simplify things until I got something working. I've also tried the method of using XPath included in the sample application ApplyXPathDOM in the Xalan package. However the compiled expression method is more ideal for my appication. Thanks ----------------------------------------------------------- - stephen.g.walizer - http://node777.net - sg...@no... ----------------------------------------------------------- On Jul 28, 2008, at 8:15 PM, Jacob Kjome wrote: > It would help if you provide an example document, an XPath > expression, and the > node you expect it to match. > > Jake > > Stephen G. Walizer wrote: >> Is there some incompatibility between NekoHTML and XPath as >> implemented by Xalan? I have tried several different methods of >> getting XPath expressions to work on NekoHTML produced documents and >> am having no luck. I can traverse the generated DOM tree, but XPATH >> expressions never produce any results. >> >> I have tried using both a compiled XPathExpression as well as an >> XPathEvaluator with no luck. >> >> I am using NekoHTML 1.9.8 and Xalan-J 2.7.1. >> >> Thank You, >> ----------------------------------------------------------- >> - stephen.g.walizer - http://node777.net - sg...@no... >> ----------------------------------------------------------- > > > ---------------------------------------------------------------------- > --- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user |
From: Jacob K. <ho...@vi...> - 2008-07-29 13:21:28
|
try... String expression = "//TITLE"; The HTML DOM is UPPER-case, per specification. You might be able to force this lower-case by setting the "http://cyberneko.org/html/properties/names/elems" [1] property to "lower", though I haven't actually tested this. I wouldn't count on this working, because if you use the HTML DOM API to create elements, they will end up UPPER-case elements. Ultimately, if you want lower-case elements, use XHTML along with a standard XML parser, not HTML with NekoHTML as the parser.... Actually, I take that back. You can probably continue to use NekoHTML as the parser and specify a DOM, other than the HTML DOM, using the Xerces "http://apache.org/xml/properties/dom/document-class-name" [2] property. Of course, then you don't get to use the convenience of the HTML DOM. But if you are only using the standard DOM API and XPath already, then this shouldn't be an issue. [1] http://nekohtml.sourceforge.net/settings.html#elem-names [2] http://xerces.apache.org/xerces2-j/properties.html#dom.document-class-name Jake On Mon, 28 Jul 2008 20:05:42 -0400 "Stephen G. Walizer" <sg...@no...> wrote: > I'll even include the code I'm testing with. > > import javax.xml.xpath.XPath; > import javax.xml.xpath.XPathConstants; > import javax.xml.xpath.XPathExpression; > import javax.xml.xpath.XPathFactory; > > import org.cyberneko.html.parsers.DOMParser; > import org.w3c.dom.Document; > import org.w3c.dom.NodeList; > > public class Test4 { > public static void main(String[] args) { > try { > XPathFactory xpFactory = XPathFactory.newInstance(); > XPath xpath = xpFactory.newXPath(); > String expression = "//title"; > XPathExpression xpathExpression = xpath.compile(expression); > DOMParser parser = new DOMParser(); > parser.setFeature("http://xml.org/sax/features/namespaces", false); > parser.parse("./test2.html"); > Document doc = parser.getDocument(); > Object result = xpathExpression.evaluate(doc, > XPathConstants.NODESET); > NodeList nodes = (NodeList) result; > for (int i = 0; i < nodes.getLength(); i++) { > System.out.println(nodes.item(i).getNodeValue()); > } > } catch(Exception e) { > e.printStackTrace(); > } > > } > } > > Test HTML > <html> > <head> > <title>Test Page</title> > </head> > </body> > <p>Foo</p> > </body> > </html> > > Returns an empty node set regardless of what I use for expression. I > was originally using a more complex HTML but figured I'd simplify > things until I got something working. > > I've also tried the method of using XPath included in the sample > application ApplyXPathDOM in the Xalan package. However the compiled > expression method is more ideal for my appication. > > Thanks > ----------------------------------------------------------- > - stephen.g.walizer - http://node777.net - sg...@no... > ----------------------------------------------------------- > > > > On Jul 28, 2008, at 8:15 PM, Jacob Kjome wrote: > >> It would help if you provide an example document, an XPath >> expression, and the >> node you expect it to match. >> >> Jake >> >> Stephen G. Walizer wrote: >>> Is there some incompatibility between NekoHTML and XPath as >>> implemented by Xalan? I have tried several different methods of >>> getting XPath expressions to work on NekoHTML produced documents and >>> am having no luck. I can traverse the generated DOM tree, but XPATH >>> expressions never produce any results. >>> >>> I have tried using both a compiled XPathExpression as well as an >>> XPathEvaluator with no luck. >>> >>> I am using NekoHTML 1.9.8 and Xalan-J 2.7.1. >>> >>> Thank You, >>> ----------------------------------------------------------- >>> - stephen.g.walizer - http://node777.net - sg...@no... >>> ----------------------------------------------------------- >> >> >> ---------------------------------------------------------------------- >> --- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win >> great prizes >> Grand prize is a trip for two to an Open Source event anywhere in >> the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> nekohtml-user mailing list >> nek...@li... >> https://lists.sourceforge.net/lists/listinfo/nekohtml-user > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great >prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > |
From: Stephen G. W. <sg...@no...> - 2008-07-29 13:28:55
|
Thank you, that was the problem. I'm smacking myself in the forehead right now lol. ----------------------------------------------------------- - stephen.g.walizer - http://node777.net - sg...@no... ----------------------------------------------------------- On Jul 29, 2008, at 9:21 AM, Jacob Kjome wrote: > try... > > String expression = "//TITLE"; > > The HTML DOM is UPPER-case, per specification. You might be able > to force > this lower-case by setting the > "http://cyberneko.org/html/properties/names/elems" [1] property to > "lower", > though I haven't actually tested this. I wouldn't count on this > working, > because if you use the HTML DOM API to create elements, they will > end up > UPPER-case elements. Ultimately, if you want lower-case elements, > use XHTML > along with a standard XML parser, not HTML with NekoHTML as the > parser.... > Actually, I take that back. You can probably continue to use > NekoHTML as the > parser and specify a DOM, other than the HTML DOM, using the Xerces > "http://apache.org/xml/properties/dom/document-class-name" [2] > property. Of > course, then you don't get to use the convenience of the HTML DOM. > But if you > are only using the standard DOM API and XPath already, then this > shouldn't be > an issue. > > > [1] http://nekohtml.sourceforge.net/settings.html#elem-names > [2] http://xerces.apache.org/xerces2-j/properties.html#dom.document- > class-name > > > Jake > > > On Mon, 28 Jul 2008 20:05:42 -0400 > "Stephen G. Walizer" <sg...@no...> wrote: >> I'll even include the code I'm testing with. >> >> import javax.xml.xpath.XPath; >> import javax.xml.xpath.XPathConstants; >> import javax.xml.xpath.XPathExpression; >> import javax.xml.xpath.XPathFactory; >> >> import org.cyberneko.html.parsers.DOMParser; >> import org.w3c.dom.Document; >> import org.w3c.dom.NodeList; >> >> public class Test4 { >> public static void main(String[] args) { >> try { >> XPathFactory xpFactory = XPathFactory.newInstance(); >> XPath xpath = xpFactory.newXPath(); >> String expression = "//title"; >> XPathExpression xpathExpression = xpath.compile(expression); >> DOMParser parser = new DOMParser(); >> parser.setFeature("http://xml.org/sax/features/namespaces", >> false); >> parser.parse("./test2.html"); >> Document doc = parser.getDocument(); >> Object result = xpathExpression.evaluate(doc, >> XPathConstants.NODESET); >> NodeList nodes = (NodeList) result; >> for (int i = 0; i < nodes.getLength(); i++) { >> System.out.println(nodes.item(i).getNodeValue()); >> } >> } catch(Exception e) { >> e.printStackTrace(); >> } >> >> } >> } >> >> Test HTML >> <html> >> <head> >> <title>Test Page</title> >> </head> >> </body> >> <p>Foo</p> >> </body> >> </html> >> >> Returns an empty node set regardless of what I use for expression. I >> was originally using a more complex HTML but figured I'd simplify >> things until I got something working. >> >> I've also tried the method of using XPath included in the sample >> application ApplyXPathDOM in the Xalan package. However the compiled >> expression method is more ideal for my appication. >> >> Thanks >> ----------------------------------------------------------- >> - stephen.g.walizer - http://node777.net - sg...@no... >> ----------------------------------------------------------- >> >> >> >> On Jul 28, 2008, at 8:15 PM, Jacob Kjome wrote: >> >>> It would help if you provide an example document, an XPath >>> expression, and the >>> node you expect it to match. >>> >>> Jake >>> >>> Stephen G. Walizer wrote: >>>> Is there some incompatibility between NekoHTML and XPath as >>>> implemented by Xalan? I have tried several different methods of >>>> getting XPath expressions to work on NekoHTML produced documents >>>> and >>>> am having no luck. I can traverse the generated DOM tree, but XPATH >>>> expressions never produce any results. >>>> >>>> I have tried using both a compiled XPathExpression as well as an >>>> XPathEvaluator with no luck. >>>> >>>> I am using NekoHTML 1.9.8 and Xalan-J 2.7.1. >>>> >>>> Thank You, >>>> ----------------------------------------------------------- >>>> - stephen.g.walizer - http://node777.net - sg...@no... >>>> ----------------------------------------------------------- >>> >>> >>> -------------------------------------------------------------------- >>> -- >>> --- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>> challenge >>> Build the coolest Linux based applications with Moblin SDK & win >>> great prizes >>> Grand prize is a trip for two to an Open Source event anywhere in >>> the world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> nekohtml-user mailing list >>> nek...@li... >>> https://lists.sourceforge.net/lists/listinfo/nekohtml-user >> >> >> --------------------------------------------------------------------- >> ---- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win >> great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in >> the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> nekohtml-user mailing list >> nek...@li... >> https://lists.sourceforge.net/lists/listinfo/nekohtml-user >> > > > ---------------------------------------------------------------------- > --- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user |
From: <ke...@us...> - 2008-07-29 12:46:21
|
At a wild guess: Are you sure NekoHTML is producing DOM 2.0 compatable nodes? XPath needs namespace-awareness, so I think Xalan is making the 2.0 DOM nodes a prerequisite... ______________________________________ "... Three things see no end: A loop with exit code done wrong, A semaphore untested, And the change that comes along. ..." -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish ( http://www.ovff.org/pegasus/songs/threes-rev-11.html) |
From: Stephen G. W. <sg...@no...> - 2008-07-31 02:13:23
|
Turns out that wasn't the problem. I got everything working. basically I was being an idiot and forgot that the HTML DOM requires tag names to be upper cased. Thanks. ----------------------------------------------------------- - stephen.g.walizer - http://node777.net - sg...@no... ----------------------------------------------------------- On Jul 29, 2008, at 8:45 AM, ke...@us... wrote: > > At a wild guess: Are you sure NekoHTML is producing DOM 2.0 > compatable nodes? XPath needs namespace-awareness, so I think Xalan > is making the 2.0 DOM nodes a prerequisite... > > ______________________________________ > "... Three things see no end: A loop with exit code done wrong, > A semaphore untested, And the change that comes along. ..." > -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (http:// > www.ovff.org/pegasus/songs/threes-rev-11.html) |