#22 Suport for child selectors

Next_Release
open
nobody
None
5
2014-04-16
2010-11-18
No

Child selectors: http://www.w3.org/TR/CSS2/selector.html#child-selectors

Example:
$ php -r 'require_once "simple_html_dom.php"; $html = str_get_html("

"); var_dump(count($html->find("p > b")));'
int(2)

If child selectors were supported, the result should be int(1) and the node return should be the first

Discussion

  • Very importent selector for me too...

     
  • cyberbeat
    cyberbeat
    2011-10-02

    please implement this!

     
  • Wolf01
    Wolf01
    2011-11-11

    These might be good too:
    body>p:nth-of-type(1)
    h1+ * p:nth-of-type(1)

    I use them to style the very first

    on the page and the very first

    after the

    , for example:
    some content

    this is the very first p of the page


    Title



    some content

    this p should be styled


    this p should not be styled



    this p should not be styled

     
  • Cédric Tailly
    Cédric Tailly
    2012-06-24

    I have the same problem, since this request is opened from a few months, I try to modify the source code by myself to add this behaviour.

    The first test case is problematic because 'b' tag can't be nested, so mine is :

    require_once "simple_html_dom.php";
    $html = str_get_html("

    ");
    var_dump(count($html->find("p>div")));
    var_dump(count($html->find("p div")));

    ...it returns :
    1
    2

    Here is a patch from the build 193 if you are interested...


    Index: simple_html_dom.php

    --- simple_html_dom.php (revision 194)
    +++ simple_html_dom.php (working copy)
    @@ -507,7 +507,7 @@
    {
    $n = ($k===-1) ? $this->dom->root : $this->dom->nodes[$k];
    //PaperG - Pass this optional parameter on to the seek function.
    - $n->seek($selectors[$c][$l], $ret, $lowercase);
    + $n->seek($selectors[$c][$l], $ret, $lowercase, $selectors[$c][$l][5]);
    }
    $head = $ret;
    }
    @@ -534,7 +534,7 @@

     // seek for given conditions
     // PaperG - added parameter to allow for case insensitive testing of the value of a selector.
    
    • protected function seek($selector, &$ret, $lowercase=false)
    • protected function seek($selector, &$ret, $lowercase=false, $direct=false)
      {
      global $debugObject;
      if (is_object($debugObject)) { $debugObject->debugLogEntry(1); }
      @@ -570,6 +570,9 @@
      for ($i=$this->_[HDOM_INFO_BEGIN]+1; $i<$end; ++$i) {
      $node = $this->dom->nodes[$i];

    • if ( $direct && $node->parent() != $this )

    • continue;
      +
      $pass = true;

           if ($tag==='*' && !$key) {
      

      @@ -664,7 +667,7 @@
      // This implies that an html attribute specifier may start with an @ sign that is NOT captured by the expression.
      // farther study is required to determine of this should be documented or removed.
      // $pattern = "/([\w-:*])(?:#([\w-]+)|.([\w-]+))?(?:[@?(!?[\w-]+)(?:([!*^$]?=)[\"']?(.?)[\"']?)?])?([\/, ]+)/is";
      - $pattern = "/([\w-:*])(?:#([\w-]+)|.([\w-]+))?(?:[@?(!?[\w-:]+)(?:([!*^$]?=)[\"']?(.?)[\"']?)?])?([\/, ]+)/is";
      + $pattern = "/( > )?([\w-:*])(?:#([\w-]+)|.([\w-]+))?(?:[@?(!?[\w-:]+)(?:([!*^$]?=)[\"']?(.?)[\"']?)?])?([\/, ]*)/is";
      preg_match_all($pattern, trim($selector_string).' ', $matches, PREG_SET_ORDER);
      if (is_object($debugObject)) {$debugObject->debugLog(2, "Matches Array: ", $matches);}

    @@ -676,22 +679,22 @@
    $m[0] = trim($m[0]);
    if ($m[0]==='' || $m[0]==='/' || $m[0]==='//') continue;
    // for browser generated xpath
    - if ($m[1]==='tbody') continue;
    + if ($m[2]==='tbody') continue;

    • list($tag, $key, $val, $exp, $no_key) = array($m[1], null, null, '=', false);
    • if (!empty($m[2])) {$key='id'; $val=$m[2];}
    • if (!empty($m[3])) {$key='class'; $val=$m[3];}
    • if (!empty($m[4])) {$key=$m[4];}
    • if (!empty($m[5])) {$exp=$m[5];}
    • if (!empty($m[6])) {$val=$m[6];}
    • list($tag, $key, $val, $exp, $no_key) = array($m[2], null, null, '=', false);
    • if (!empty($m[3])) {$key='id'; $val=$m[3];}
    • if (!empty($m[4])) {$key='class'; $val=$m[4];}
    • if (!empty($m[5])) {$key=$m[5];}
    • if (!empty($m[6])) {$exp=$m[6];}
    • if (!empty($m[7])) {$val=$m[7];}

           // convert to lowercase
           if ($this->dom->lowercase) {$tag=strtolower($tag); $key=strtolower($key);}
           //elements that do NOT have the specified attribute
           if (isset($key[0]) && $key[0]==='!') {$key=substr($key, 1); $no_key=true;}
      
    • $result[] = array($tag, $key, $val, $exp, $no_key);

    • if (trim($m[7])===',') {
    • $result[] = array($tag, $key, $val, $exp, $no_key, strlen($m[1]) > 0);
    • if (trim($m[8])===',') {
      $selectors[] = $result;
      $result = array();
      }
     
  • John Schlick
    John Schlick
    2012-10-10

    Child selectors ARE supported. Not with the > syntax though. I'm not sure why you would use that syntax.

    find('div[class=a] span[class=b]', 0)

    will return the first span of class b that is in a div of class a in the dom.

    • status: open --> closed
    • milestone: --> Next_Release
     
  • cyberbeat
    cyberbeat
    2014-04-12

    @John Schlick: I think you got it wrong. The CSS-Selectors

    div span

    and

    div > span

    are not the same, the second one only finds DIRECT span children.

    Example, where php-simple-html-dom really annoys:

    ... ...
    some text
    some more text
    some text
    some more text

    Now I only want to fetch the rows of the first table with one command.

    That would be

    $doc->find("table > tr")

    but at the moment it is only possible to get all rows, including the one from the nested table via

    $doc->find("table tr")

    So this would really be a very important addition!

     
  • cyberbeat
    cyberbeat
    2014-04-12

    aah, sourceforge destroyed my html, the example without braces:

    table
    tr td
    table tr td (sometext) /td /tr /table
    /td
    td somemoretext /td
    /tr
    ...
    ...
    /table

     
  • John Schlick
    John Schlick
    2014-04-12

    Thank you for clarifying the difference between the selectors.
    I'll have to look at the code provided above in order to incorporate it

    It would be easier for me to look at if it were in diff format as opposed to patch format.

    I'll also state (without looking at the attached code) that this might still not do it completely, as some tags are "automatically closed" by the original author to prevent things from going haywire with broken html. That will have to get looked at to make sure this is actually doing whats expected.

     
    Last edit: John Schlick 2014-04-12
  • John Schlick
    John Schlick
    2014-04-12

    • status: closed --> open
     
  • Cédric Tailly
    Cédric Tailly
    2014-04-16

    Here is a zip file with the patch and a test case, it would be clearer if you open it with TortoiseSVN.

    Sorry I had some problems in adding attachments when I posted my message, this explains the copy/paste of the patch's content.

     
    Attachments