Child selectors: http://www.w3.org/TR/CSS2/selector.html#child-selectors
Example:
$ php -r 'require_once "simple_html_dom.php"; $html = str_get_html("
"); var_dump(count($html->find("p > b")));'
If child selectors were supported, the result should be int(1) and the node return should be the first
Very importent selector for me too...
please implement this!
These might be good too:
body>p:nth-of-type(1)
h1+ * p:nth-of-type(1)
I use them to style the very first
on the page and the very first
after the
, for example:
I have the same problem, since this request is opened from a few months, I try to modify the source code by myself to add this behaviour.
The first test case is problematic because 'b' tag can't be nested, so mine is :
require_once "simple_html_dom.php";
$html = str_get_html("
var_dump(count($html->find("p>div")));
var_dump(count($html->find("p div")));
...it returns :
1
2
Here is a patch from the build 193 if you are interested...
Index: simple_html_dom.php
--- simple_html_dom.php (revision 194)
+++ simple_html_dom.php (working copy)
@@ -507,7 +507,7 @@
{
$n = ($k===-1) ? $this->dom->root : $this->dom->nodes[$k];
//PaperG - Pass this optional parameter on to the seek function.
$n->seek($selectors[$c][$l], $ret, $lowercase, $selectors[$c][$l][5]);
}
$head = $ret;
}
@@ -534,7 +534,7 @@
// seek for given conditions
// PaperG - added parameter to allow for case insensitive testing of the value of a selector.
protected function seek($selector, &$ret, $lowercase=false)
protected function seek($selector, &$ret, $lowercase=false, $direct=false)
{
global $debugObject;
if (is_object($debugObject)) { $debugObject->debugLogEntry(1); }
@@ -570,6 +570,9 @@
for ($i=$this->_[HDOM_INFO_BEGIN]+1; $i<$end; ++$i) {
$node = $this->dom->nodes[$i];
if ( $direct && $node->parent() != $this )
continue;
+
$pass = true;
@@ -664,7 +667,7 @@
// This implies that an html attribute specifier may start with an @ sign that is NOT captured by the expression.
// farther study is required to determine of this should be documented or removed.
// $pattern = "/([\w-:*])(?:#([\w-]+)|.([\w-]+))?(?:[@?(!?[\w-]+)(?:([!*^$]?=)[\"']?(.?)[\"']?)?])?([\/, ]+)/is";
$pattern = "/([\w-:*])(?:#([\w-]+)|.([\w-]+))?(?:[@?(!?[\w-:]+)(?:([!*^$]?=)[\"']?(.?)[\"']?)?])?([\/, ]+)/is";
preg_match_all($pattern, trim($selector_string).' ', $matches, PREG_SET_ORDER);
if (is_object($debugObject)) {$debugObject->debugLog(2, "Matches Array: ", $matches);}
@@ -676,22 +679,22 @@
$m[0] = trim($m[0]);
if ($m[0]==='' || $m[0]==='/' || $m[0]==='//') continue;
// for browser generated xpath
if ($m[2]==='tbody') continue;
list($tag, $key, $val, $exp, $no_key) = array($m[1], null, null, '=', false);
if (!empty($m[7])) {$val=$m[7];}
$result[] = array($tag, $key, $val, $exp, $no_key);
$selectors[] = $result;
$result = array();
}
Child selectors ARE supported. Not with the > syntax though. I'm not sure why you would use that syntax.
find('div[class=a] span[class=b]', 0)
will return the first span of class b that is in a div of class a in the dom.
@John Schlick: I think you got it wrong. The CSS-Selectors
div span
and
div > span
are not the same, the second one only finds DIRECT span children.
Example, where php-simple-html-dom really annoys:
... ...Now I only want to fetch the rows of the first table with one command.
That would be
$doc->find("table > tr")
but at the moment it is only possible to get all rows, including the one from the nested table via
$doc->find("table tr")
So this would really be a very important addition!
aah, sourceforge destroyed my html, the example without braces:
table
tr td
table tr td (sometext) /td /tr /table
/td
td somemoretext /td
/tr
...
...
/table
Thank you for clarifying the difference between the selectors.
I'll have to look at the code provided above in order to incorporate it
It would be easier for me to look at if it were in diff format as opposed to patch format.
I'll also state (without looking at the attached code) that this might still not do it completely, as some tags are "automatically closed" by the original author to prevent things from going haywire with broken html. That will have to get looked at to make sure this is actually doing whats expected.
Last edit: John Schlick 2014-04-12
Here is a zip file with the patch and a test case, it would be clearer if you open it with TortoiseSVN.
Sorry I had some problems in adding attachments when I posted my message, this explains the copy/paste of the patch's content.
Any update?
Noticed this crop up few times on Stack Overflow.
Last edit: LeoTM 2016-10-24
Support for child combinators ('>') was added in 1.8.