From: SourceForge.net <no...@so...> - 2010-09-14 08:44:02
|
Bugs item #3065808, was opened at 2010-09-14 10:44 Message generated for change (Tracker Item Submitted) made by coseedirk You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=952178&aid=3065808&group_id=195122 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: coseedirk (coseedirk) Assigned to: Nobody/Anonymous (nobody) Summary: Xpath expression fails depending on number of invalid tags Initial Comment: Hello! I use NekoHTML Version 1.9.14. to parse html documents from the web and run xpath expression against the resulting DOM to get specific elements / content. Usually Neko works fine with invalid html documents, but I discovered a case of \"double closed\" tags (e.g. <a></a></a>) where the parsing fails. Depending on the number (!) of double closed tags and the complexity of the html code structure the parsing either works or fails. The attached JUnit Test contains 5 testcases. First three cases show a minimal example where parsing fails with two (or more) double closed a-tags, but works if it is only one double closed a-tag. Last two cases show a more complex example using a table where four or more double closed a-tags result in a parsing failure, but for less than four it works. Currently the workaround is to get the parsed Neko Document (org.w3c.dom.Document), get the String representation and use it to parse it again with Neko. After parsing the invalid html page twice the xpath expression works, but this is of course no solution. Workaround is included in the JUnit Test! Due to the restriction of only 256k upload size, the following libs are missing in the lib directory: - nekohtml.jar - xercesImpl.jar ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=952178&aid=3065808&group_id=195122 |