From: Jacob K. <ho...@vi...> - 2011-01-27 21:42:52
|
You're parsing successfully with NekoHTML and it's returning an HTMLDocument. Why are you the converting this into a DOM4J document? Just because it provides a convenient DOMWriter? DOM4j presumes that your Document represents an XML-based DOM, which supports namespaces. The HTML DOM does not support namespaces, by default. That said, NekoHTML has a NamespaceBinderFilter [1]. You'd need to both set this as a filter to the parser and set the feature "http://xml.org/sax/features/namespaces". I haven't actually tried this, but it might solve your issue. You may also want to look at the XML purifier filter [2]. [1] http://nekohtml.sourceforge.net/filters.html#filters.namespaces [2] http://nekohtml.sourceforge.net/filters.html#filters.well-formedness Jake On Thu, 27 Jan 2011 11:36:10 -0800 Neera Sharma <nee...@gm...> wrote: > Hi Jake, > > Thanks for looking into it. Here is the code snippet - > > org.cyberneko.html.parsers.DOMParser parser = new > org.cyberneko.html.parsers.DOMParser(); > parser.setFeature("http://cyberneko.org/html/features/augmentations", true); > parser.setProperty("http://cyberneko.org/html/properties/names/elems", >"lower"); > parser.parse(url); > Document document = parser.getDocument(); > DOMReader reader = new DOMReader(); > org.dom4j.Document doc = reader.read(document); > DOMWriter writer = new DOMWriter(); > return writer.write(doc); > > Neera > > > > On Thu, Jan 27, 2011 at 6:51 AM, Jacob Kjome <ho...@vi...> wrote: >> >> It appears to me that the parser being used is DOM4j, not NekoHTML. DOM4j >>is >> an XML parser, not an HTML parser. Please provide a snippet of code showing >> how you are engaging the parser. >> >> Jake >> >> On Thu, 27 Jan 2011 11:11:57 +0100 >> Marc Guillemot <mgu...@ya...> wrote: >>> Hi, >>> >>> please reduce the html content to the minimum allowing to reproduce the >>> problem and open an issue. >>> >>> Cheers, >>> Marc. >>> -- >>> HtmlUnit support & consulting from the source >>> Blog: http://mguillem.wordpress.com >>> >>> >>> Le 27/01/2011 00:44, Neera Sharma a écrit : >>>> Hi All, >>>> >>>> I am trying to use DOM parser to parse the following URL >>>> http://www.helloneighbour.com/save/city-movers-and-transports-trucking-and-freight-mississauga >>>> >>>> with the following parser settings - >>>> >>>> parser.setFeature("http://cyberneko.org/html/features/augmentations", >>>>true); >>>> >>>> parser.setProperty("http://cyberneko.org/html/properties/names/elems", >>>> "lower"); >>>> >>>> I am running into the following namespace error >>>> >>>> org.w3c.dom.DOMException: NAMESPACE_ERR: An attempt is made to create >>>> or change an object in a way which is incorrect with regard to >>>> namespaces. >>>> >>>> at org.apache.xerces.dom.CoreDocumentImpl?.checkNamespaceWF(Unknown Source) >>>> at org.apache.xerces.dom.ElementNSImpl.setName(Unknown Source) >>>> at org.apache.xerces.dom.ElementNSImpl.<init>(Unknown Source) >>>> at org.apache.xerces.dom.CoreDocumentImpl?.createElementNS(Unknown Source) >>>> at org.dom4j.io.DOMWriter.appendDOMTree(DOMWriter.java:181) >>>> at org.dom4j.io.DOMWriter.appendDOMTree(DOMWriter.java:158) >>>> ----- >>>> ----- >>>> >>>> I would really appreciate any help that you can provide to resolve the >>>>issue. >>>> >>>> >>>> Thanks, >>>> Neera >>>> >>> >>> ------------------------------------------------------------------------------ >>> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! >>>Finally, a world-class log management solution at an even better price-free! >>> Download using promo code Free_Logger_4_Dev2Dev. Offer expires >>>February 28th, so secure your free ArcSight Logger TODAY! >>> http://p.sf.net/sfu/arcsight-sfd2d >>> _______________________________________________ >>> nekohtml-user mailing list >>> nek...@li... >>> https://lists.sourceforge.net/lists/listinfo/nekohtml-user >>> >> >> >> ------------------------------------------------------------------------------ >> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! >> Finally, a world-class log management solution at an even better price-free! >> Download using promo code Free_Logger_4_Dev2Dev. Offer expires >> February 28th, so secure your free ArcSight Logger TODAY! >> http://p.sf.net/sfu/arcsight-sfd2d >> _______________________________________________ >> nekohtml-user mailing list >> nek...@li... >> https://lists.sourceforge.net/lists/listinfo/nekohtml-user >> > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! >Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires >February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user |