Scott Wilson
-
2021-09-24
- Group: v2.25 --> v2.26
XPather.java has a subtle bug that causes non-standard behavior when a predicate begins with //.
Given a document:
<div>
<span>Span Content</span>
<div>Div Content</div>
</div>
The following selector is valid, but flawed:
//div[//span]
Because the predicate [//span] begins with two slashes, it matches all <span> tags from the document root. Since the document contains a <span>, the predicate is always true. This selector matches both <div>s. I have corroborated this in several standards-compliant XPath implementations.
However, HtmlCleaner treats the selector as though it were a descendant-only predicate:
//div[.//span]
...and matches only the outer <div>.
I am attaching a patch that adds two assertions to XPatherTest.java. The second assertion fails due to this bug.