Menu

#223 XPath bug with predicate beginning with //

v2.30
open
nobody
xpath (1)
5
2023-06-19
2021-01-02
No

XPather.java has a subtle bug that causes non-standard behavior when a predicate begins with //.

Given a document:

<div>
  <span>Span Content</span>
  <div>Div Content</div>
</div>

The following selector is valid, but flawed:

//div[//span]

Because the predicate [//span] begins with two slashes, it matches all <span> tags from the document root. Since the document contains a <span>, the predicate is always true. This selector matches both <div>s. I have corroborated this in several standards-compliant XPath implementations.

However, HtmlCleaner treats the selector as though it were a descendant-only predicate:

//div[.//span]

...and matches only the outer <div>.

I am attaching a patch that adds two assertions to XPatherTest.java. The second assertion fails due to this bug.

1 Attachments

Discussion

  • Scott Wilson

    Scott Wilson - 2021-09-24
    • Group: v2.25 --> v2.26
     
  • Scott Wilson

    Scott Wilson - 2023-04-29
    • Group: v2.26 --> v2.29
     
  • Scott Wilson

    Scott Wilson - 2023-06-19
    • Group: v2.29 --> v2.30
     

Log in to post a comment.

MongoDB Logo MongoDB