Menu

#22 Suport for child selectors

Unassigned
closed
None
2019-04-18
2010-11-18
No

Child selectors: http://www.w3.org/TR/CSS2/selector.html#child-selectors

Example:
$ php -r 'require_once "simple_html_dom.php"; $html = str_get_html("

"); var_dump(count($html->find("p > b")));'
int(2)

If child selectors were supported, the result should be int(1) and the node return should be the first

Discussion

  • Maxsim Varentsov

    Very importent selector for me too...

     
  • cyberbeat

    cyberbeat - 2011-10-02

    please implement this!

     
  • Wolf01

    Wolf01 - 2011-11-11

    These might be good too:
    body>p:nth-of-type(1)
    h1+ * p:nth-of-type(1)

    I use them to style the very first

    on the page and the very first

    after the

    , for example:
    some content

    this is the very first p of the page


    Title



    some content

    this p should be styled


    this p should not be styled



    this p should not be styled

     
  • Cédric Tailly

    Cédric Tailly - 2012-06-24

    I have the same problem, since this request is opened from a few months, I try to modify the source code by myself to add this behaviour.

    The first test case is problematic because 'b' tag can't be nested, so mine is :

    require_once "simple_html_dom.php";
    $html = str_get_html("

    ");
    var_dump(count($html->find("p>div")));
    var_dump(count($html->find("p div")));

    ...it returns :
    1
    2

    Here is a patch from the build 193 if you are interested...


    Index: simple_html_dom.php

    --- simple_html_dom.php (revision 194)
    +++ simple_html_dom.php (working copy)
    @@ -507,7 +507,7 @@
    {
    $n = ($k===-1) ? $this->dom->root : $this->dom->nodes[$k];
    //PaperG - Pass this optional parameter on to the seek function.
    - $n->seek($selectors[$c][$l], $ret, $lowercase);
    + $n->seek($selectors[$c][$l], $ret, $lowercase, $selectors[$c][$l][5]);
    }
    $head = $ret;
    }
    @@ -534,7 +534,7 @@

     // seek for given conditions
     // PaperG - added parameter to allow for case insensitive testing of the value of a selector.
    
    • protected function seek($selector, &$ret, $lowercase=false)
    • protected function seek($selector, &$ret, $lowercase=false, $direct=false)
      {
      global $debugObject;
      if (is_object($debugObject)) { $debugObject->debugLogEntry(1); }
      @@ -570,6 +570,9 @@
      for ($i=$this->_[HDOM_INFO_BEGIN]+1; $i<$end; ++$i) {
      $node = $this->dom->nodes[$i];

    • if ( $direct && $node->parent() != $this )

    • continue;
      +
      $pass = true;

           if ($tag==='*' && !$key) {
      

      @@ -664,7 +667,7 @@
      // This implies that an html attribute specifier may start with an @ sign that is NOT captured by the expression.
      // farther study is required to determine of this should be documented or removed.
      // $pattern = "/([\w-:*])(?:#([\w-]+)|.([\w-]+))?(?:[@?(!?[\w-]+)(?:([!*^$]?=)[\"']?(.?)[\"']?)?])?([\/, ]+)/is";
      - $pattern = "/([\w-:*])(?:#([\w-]+)|.([\w-]+))?(?:[@?(!?[\w-:]+)(?:([!*^$]?=)[\"']?(.?)[\"']?)?])?([\/, ]+)/is";
      + $pattern = "/( > )?([\w-:*])(?:#([\w-]+)|.([\w-]+))?(?:[@?(!?[\w-:]+)(?:([!*^$]?=)[\"']?(.?)[\"']?)?])?([\/, ]*)/is";
      preg_match_all($pattern, trim($selector_string).' ', $matches, PREG_SET_ORDER);
      if (is_object($debugObject)) {$debugObject->debugLog(2, "Matches Array: ", $matches);}

    @@ -676,22 +679,22 @@
    $m[0] = trim($m[0]);
    if ($m[0]==='' || $m[0]==='/' || $m[0]==='//') continue;
    // for browser generated xpath
    - if ($m[1]==='tbody') continue;
    + if ($m[2]==='tbody') continue;

    • list($tag, $key, $val, $exp, $no_key) = array($m[1], null, null, '=', false);
    • if (!empty($m[2])) {$key='id'; $val=$m[2];}
    • if (!empty($m[3])) {$key='class'; $val=$m[3];}
    • if (!empty($m[4])) {$key=$m[4];}
    • if (!empty($m[5])) {$exp=$m[5];}
    • if (!empty($m[6])) {$val=$m[6];}
    • list($tag, $key, $val, $exp, $no_key) = array($m[2], null, null, '=', false);
    • if (!empty($m[3])) {$key='id'; $val=$m[3];}
    • if (!empty($m[4])) {$key='class'; $val=$m[4];}
    • if (!empty($m[5])) {$key=$m[5];}
    • if (!empty($m[6])) {$exp=$m[6];}
    • if (!empty($m[7])) {$val=$m[7];}

           // convert to lowercase
           if ($this->dom->lowercase) {$tag=strtolower($tag); $key=strtolower($key);}
           //elements that do NOT have the specified attribute
           if (isset($key[0]) && $key[0]==='!') {$key=substr($key, 1); $no_key=true;}
      
    • $result[] = array($tag, $key, $val, $exp, $no_key);

    • if (trim($m[7])===',') {
    • $result[] = array($tag, $key, $val, $exp, $no_key, strlen($m[1]) > 0);
    • if (trim($m[8])===',') {
      $selectors[] = $result;
      $result = array();
      }
     
  • John Schlick

    John Schlick - 2012-10-10

    Child selectors ARE supported. Not with the > syntax though. I'm not sure why you would use that syntax.

    find('div[class=a] span[class=b]', 0)

    will return the first span of class b that is in a div of class a in the dom.

    • status: open --> closed
    • milestone: --> Next_Release
     
  • cyberbeat

    cyberbeat - 2014-04-12

    @John Schlick: I think you got it wrong. The CSS-Selectors

    div span

    and

    div > span

    are not the same, the second one only finds DIRECT span children.

    Example, where php-simple-html-dom really annoys:

    ... ...
    some text
    some more text
    some text
    some more text

    Now I only want to fetch the rows of the first table with one command.

    That would be

    $doc->find("table > tr")

    but at the moment it is only possible to get all rows, including the one from the nested table via

    $doc->find("table tr")

    So this would really be a very important addition!

     
  • cyberbeat

    cyberbeat - 2014-04-12

    aah, sourceforge destroyed my html, the example without braces:

    table
    tr td
    table tr td (sometext) /td /tr /table
    /td
    td somemoretext /td
    /tr
    ...
    ...
    /table

     
  • John Schlick

    John Schlick - 2014-04-12

    Thank you for clarifying the difference between the selectors.
    I'll have to look at the code provided above in order to incorporate it

    It would be easier for me to look at if it were in diff format as opposed to patch format.

    I'll also state (without looking at the attached code) that this might still not do it completely, as some tags are "automatically closed" by the original author to prevent things from going haywire with broken html. That will have to get looked at to make sure this is actually doing whats expected.

     

    Last edit: John Schlick 2014-04-12
  • John Schlick

    John Schlick - 2014-04-12
    • status: closed --> open
     
  • Cédric Tailly

    Cédric Tailly - 2014-04-16

    Here is a zip file with the patch and a test case, it would be clearer if you open it with TortoiseSVN.

    Sorry I had some problems in adding attachments when I posted my message, this explains the copy/paste of the patch's content.

     
  • LeoTM

    LeoTM - 2016-10-24

    Any update?
    Noticed this crop up few times on Stack Overflow.

     

    Last edit: LeoTM 2016-10-24
  • LogMANOriginal

    LogMANOriginal - 2019-04-15
    • status: open --> closed
    • assigned_to: LogMANOriginal
     
  • LogMANOriginal

    LogMANOriginal - 2019-04-15

    Support for child combinators ('>') was added in 1.8.

     

Log in to post a comment.