I was running tests and encountered problems using tbody element as a selector in find(). Looking through code, find() has a specific check for tbody. I can comment out that line, but I'm wondering why there was a specific check for that one element.
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
I can confirm this bug. And it leads to unexpected behavior, for example when selecting TR tags that are in TBODY section. What happens is that TR's from THEAD also get included (which wasn't wanted in my case and before finding this bug I could not understand why it is happening). Commenting the mentioned line fixed the problem for me (thank you, gagnon).
Please fix this bug.
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Steps to reproduce the bug:
$html = str_get_html('');
echo $html->find('table tbody tr', 0)->innertext;
?>
system$ php domtest.php
THIS IS THE WRONG ONE
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
some remarks:
I can confirm it has something to do with line 651: if ($m[1]==='tbody') continue;
commenting this one out solves my problem.
This line of code appears in r174 but I cannot find something related in the changelog: http://simplehtmldom.svn.sourceforge.net/viewvc/simplehtmldom/trunk/change_log.txt?r1=173&r2=174
Description of the line of code says: // for browser generated xpath
Last edit: Anonymous 2015-08-15
I can also confirm this bug:
It leads to unexpected behavior, for example when selecting TR tags that are in TBODY section. What happens is that TR's from THEAD also get included (which wasn't wanted in my case and before finding this bug I could not understand why it is happening).
bug still present in 2016
Thanks for reporting this issue!
You are right, this is incorrect behavior for CSS selectors, yet necessary for XPath selectors generated by browsers (although not responsibility of the parser). The reason for this is that browsers add the 'tbody' element to tables which don't have it, while the raw document stays untouched (that is what you'd normally pass to the parser).
This is fixed in [f24dd8] by removing the offending line. Existing selectors must be updated (remove 'tbody') in order to maintain the previous state. Otherwise results may change (i.e. element not found or index suddenly points to the wrong item).
Related
Commit: [f24dd8]
This bug is still with us :-( on Corona days..2020
Can you give me an example that fails for you?