From: Neera S. <nee...@gm...> - 2011-01-26 23:44:15
|
Hi All, I am trying to use DOM parser to parse the following URL http://www.helloneighbour.com/save/city-movers-and-transports-trucking-and-freight-mississauga with the following parser settings - parser.setFeature("http://cyberneko.org/html/features/augmentations", true); parser.setProperty("http://cyberneko.org/html/properties/names/elems", "lower"); I am running into the following namespace error org.w3c.dom.DOMException: NAMESPACE_ERR: An attempt is made to create or change an object in a way which is incorrect with regard to namespaces. at org.apache.xerces.dom.CoreDocumentImpl?.checkNamespaceWF(Unknown Source) at org.apache.xerces.dom.ElementNSImpl.setName(Unknown Source) at org.apache.xerces.dom.ElementNSImpl.<init>(Unknown Source) at org.apache.xerces.dom.CoreDocumentImpl?.createElementNS(Unknown Source) at org.dom4j.io.DOMWriter.appendDOMTree(DOMWriter.java:181) at org.dom4j.io.DOMWriter.appendDOMTree(DOMWriter.java:158) ----- ----- I would really appreciate any help that you can provide to resolve the issue. Thanks, Neera |
From: Marc G. <mgu...@ya...> - 2011-01-27 10:12:06
|
Hi, please reduce the html content to the minimum allowing to reproduce the problem and open an issue. Cheers, Marc. -- HtmlUnit support & consulting from the source Blog: http://mguillem.wordpress.com Le 27/01/2011 00:44, Neera Sharma a écrit : > Hi All, > > I am trying to use DOM parser to parse the following URL > http://www.helloneighbour.com/save/city-movers-and-transports-trucking-and-freight-mississauga > > with the following parser settings - > parser.setFeature("http://cyberneko.org/html/features/augmentations", true); > parser.setProperty("http://cyberneko.org/html/properties/names/elems", > "lower"); > > I am running into the following namespace error > > org.w3c.dom.DOMException: NAMESPACE_ERR: An attempt is made to create > or change an object in a way which is incorrect with regard to > namespaces. > > at org.apache.xerces.dom.CoreDocumentImpl?.checkNamespaceWF(Unknown Source) > at org.apache.xerces.dom.ElementNSImpl.setName(Unknown Source) > at org.apache.xerces.dom.ElementNSImpl.<init>(Unknown Source) > at org.apache.xerces.dom.CoreDocumentImpl?.createElementNS(Unknown Source) > at org.dom4j.io.DOMWriter.appendDOMTree(DOMWriter.java:181) > at org.dom4j.io.DOMWriter.appendDOMTree(DOMWriter.java:158) > ----- > ----- > > I would really appreciate any help that you can provide to resolve the issue. > > > Thanks, > Neera > |
From: Jacob K. <ho...@vi...> - 2011-01-27 15:52:01
|
It appears to me that the parser being used is DOM4j, not NekoHTML. DOM4j is an XML parser, not an HTML parser. Please provide a snippet of code showing how you are engaging the parser. Jake On Thu, 27 Jan 2011 11:11:57 +0100 Marc Guillemot <mgu...@ya...> wrote: > Hi, > > please reduce the html content to the minimum allowing to reproduce the > problem and open an issue. > > Cheers, > Marc. > -- > HtmlUnit support & consulting from the source > Blog: http://mguillem.wordpress.com > > > Le 27/01/2011 00:44, Neera Sharma a écrit : >> Hi All, >> >> I am trying to use DOM parser to parse the following URL >> http://www.helloneighbour.com/save/city-movers-and-transports-trucking-and-freight-mississauga >> >> with the following parser settings - >> parser.setFeature("http://cyberneko.org/html/features/augmentations", >>true); >> parser.setProperty("http://cyberneko.org/html/properties/names/elems", >> "lower"); >> >> I am running into the following namespace error >> >> org.w3c.dom.DOMException: NAMESPACE_ERR: An attempt is made to create >> or change an object in a way which is incorrect with regard to >> namespaces. >> >> at org.apache.xerces.dom.CoreDocumentImpl?.checkNamespaceWF(Unknown Source) >> at org.apache.xerces.dom.ElementNSImpl.setName(Unknown Source) >> at org.apache.xerces.dom.ElementNSImpl.<init>(Unknown Source) >> at org.apache.xerces.dom.CoreDocumentImpl?.createElementNS(Unknown Source) >> at org.dom4j.io.DOMWriter.appendDOMTree(DOMWriter.java:181) >> at org.dom4j.io.DOMWriter.appendDOMTree(DOMWriter.java:158) >> ----- >> ----- >> >> I would really appreciate any help that you can provide to resolve the >>issue. >> >> >> Thanks, >> Neera >> > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! >Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires >February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > |
From: Neera S. <nee...@gm...> - 2011-01-27 19:36:19
|
Hi Jake, Thanks for looking into it. Here is the code snippet - org.cyberneko.html.parsers.DOMParser parser = new org.cyberneko.html.parsers.DOMParser(); parser.setFeature("http://cyberneko.org/html/features/augmentations", true); parser.setProperty("http://cyberneko.org/html/properties/names/elems", "lower"); parser.parse(url); Document document = parser.getDocument(); DOMReader reader = new DOMReader(); org.dom4j.Document doc = reader.read(document); DOMWriter writer = new DOMWriter(); return writer.write(doc); Neera On Thu, Jan 27, 2011 at 6:51 AM, Jacob Kjome <ho...@vi...> wrote: > > It appears to me that the parser being used is DOM4j, not NekoHTML. DOM4j is > an XML parser, not an HTML parser. Please provide a snippet of code showing > how you are engaging the parser. > > Jake > > On Thu, 27 Jan 2011 11:11:57 +0100 > Marc Guillemot <mgu...@ya...> wrote: >> Hi, >> >> please reduce the html content to the minimum allowing to reproduce the >> problem and open an issue. >> >> Cheers, >> Marc. >> -- >> HtmlUnit support & consulting from the source >> Blog: http://mguillem.wordpress.com >> >> >> Le 27/01/2011 00:44, Neera Sharma a écrit : >>> Hi All, >>> >>> I am trying to use DOM parser to parse the following URL >>> http://www.helloneighbour.com/save/city-movers-and-transports-trucking-and-freight-mississauga >>> >>> with the following parser settings - >>> parser.setFeature("http://cyberneko.org/html/features/augmentations", >>>true); >>> parser.setProperty("http://cyberneko.org/html/properties/names/elems", >>> "lower"); >>> >>> I am running into the following namespace error >>> >>> org.w3c.dom.DOMException: NAMESPACE_ERR: An attempt is made to create >>> or change an object in a way which is incorrect with regard to >>> namespaces. >>> >>> at org.apache.xerces.dom.CoreDocumentImpl?.checkNamespaceWF(Unknown Source) >>> at org.apache.xerces.dom.ElementNSImpl.setName(Unknown Source) >>> at org.apache.xerces.dom.ElementNSImpl.<init>(Unknown Source) >>> at org.apache.xerces.dom.CoreDocumentImpl?.createElementNS(Unknown Source) >>> at org.dom4j.io.DOMWriter.appendDOMTree(DOMWriter.java:181) >>> at org.dom4j.io.DOMWriter.appendDOMTree(DOMWriter.java:158) >>> ----- >>> ----- >>> >>> I would really appreciate any help that you can provide to resolve the >>>issue. >>> >>> >>> Thanks, >>> Neera >>> >> >> ------------------------------------------------------------------------------ >> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! >>Finally, a world-class log management solution at an even better price-free! >> Download using promo code Free_Logger_4_Dev2Dev. Offer expires >>February 28th, so secure your free ArcSight Logger TODAY! >> http://p.sf.net/sfu/arcsight-sfd2d >> _______________________________________________ >> nekohtml-user mailing list >> nek...@li... >> https://lists.sourceforge.net/lists/listinfo/nekohtml-user >> > > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > |
From: Jacob K. <ho...@vi...> - 2011-01-27 21:42:52
|
You're parsing successfully with NekoHTML and it's returning an HTMLDocument. Why are you the converting this into a DOM4J document? Just because it provides a convenient DOMWriter? DOM4j presumes that your Document represents an XML-based DOM, which supports namespaces. The HTML DOM does not support namespaces, by default. That said, NekoHTML has a NamespaceBinderFilter [1]. You'd need to both set this as a filter to the parser and set the feature "http://xml.org/sax/features/namespaces". I haven't actually tried this, but it might solve your issue. You may also want to look at the XML purifier filter [2]. [1] http://nekohtml.sourceforge.net/filters.html#filters.namespaces [2] http://nekohtml.sourceforge.net/filters.html#filters.well-formedness Jake On Thu, 27 Jan 2011 11:36:10 -0800 Neera Sharma <nee...@gm...> wrote: > Hi Jake, > > Thanks for looking into it. Here is the code snippet - > > org.cyberneko.html.parsers.DOMParser parser = new > org.cyberneko.html.parsers.DOMParser(); > parser.setFeature("http://cyberneko.org/html/features/augmentations", true); > parser.setProperty("http://cyberneko.org/html/properties/names/elems", >"lower"); > parser.parse(url); > Document document = parser.getDocument(); > DOMReader reader = new DOMReader(); > org.dom4j.Document doc = reader.read(document); > DOMWriter writer = new DOMWriter(); > return writer.write(doc); > > Neera > > > > On Thu, Jan 27, 2011 at 6:51 AM, Jacob Kjome <ho...@vi...> wrote: >> >> It appears to me that the parser being used is DOM4j, not NekoHTML. DOM4j >>is >> an XML parser, not an HTML parser. Please provide a snippet of code showing >> how you are engaging the parser. >> >> Jake >> >> On Thu, 27 Jan 2011 11:11:57 +0100 >> Marc Guillemot <mgu...@ya...> wrote: >>> Hi, >>> >>> please reduce the html content to the minimum allowing to reproduce the >>> problem and open an issue. >>> >>> Cheers, >>> Marc. >>> -- >>> HtmlUnit support & consulting from the source >>> Blog: http://mguillem.wordpress.com >>> >>> >>> Le 27/01/2011 00:44, Neera Sharma a écrit : >>>> Hi All, >>>> >>>> I am trying to use DOM parser to parse the following URL >>>> http://www.helloneighbour.com/save/city-movers-and-transports-trucking-and-freight-mississauga >>>> >>>> with the following parser settings - >>>> >>>> parser.setFeature("http://cyberneko.org/html/features/augmentations", >>>>true); >>>> >>>> parser.setProperty("http://cyberneko.org/html/properties/names/elems", >>>> "lower"); >>>> >>>> I am running into the following namespace error >>>> >>>> org.w3c.dom.DOMException: NAMESPACE_ERR: An attempt is made to create >>>> or change an object in a way which is incorrect with regard to >>>> namespaces. >>>> >>>> at org.apache.xerces.dom.CoreDocumentImpl?.checkNamespaceWF(Unknown Source) >>>> at org.apache.xerces.dom.ElementNSImpl.setName(Unknown Source) >>>> at org.apache.xerces.dom.ElementNSImpl.<init>(Unknown Source) >>>> at org.apache.xerces.dom.CoreDocumentImpl?.createElementNS(Unknown Source) >>>> at org.dom4j.io.DOMWriter.appendDOMTree(DOMWriter.java:181) >>>> at org.dom4j.io.DOMWriter.appendDOMTree(DOMWriter.java:158) >>>> ----- >>>> ----- >>>> >>>> I would really appreciate any help that you can provide to resolve the >>>>issue. >>>> >>>> >>>> Thanks, >>>> Neera >>>> >>> >>> ------------------------------------------------------------------------------ >>> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! >>>Finally, a world-class log management solution at an even better price-free! >>> Download using promo code Free_Logger_4_Dev2Dev. Offer expires >>>February 28th, so secure your free ArcSight Logger TODAY! >>> http://p.sf.net/sfu/arcsight-sfd2d >>> _______________________________________________ >>> nekohtml-user mailing list >>> nek...@li... >>> https://lists.sourceforge.net/lists/listinfo/nekohtml-user >>> >> >> >> ------------------------------------------------------------------------------ >> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! >> Finally, a world-class log management solution at an even better price-free! >> Download using promo code Free_Logger_4_Dev2Dev. Offer expires >> February 28th, so secure your free ArcSight Logger TODAY! >> http://p.sf.net/sfu/arcsight-sfd2d >> _______________________________________________ >> nekohtml-user mailing list >> nek...@li... >> https://lists.sourceforge.net/lists/listinfo/nekohtml-user >> > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! >Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires >February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user |