This code is not working
$get_dom_html = <<<'EOF'
<div class="question-title">
<h2>
<a href="http://dict.site" class="hyperlink">best chinese dictionary site {recommended! #1</a>
</h2>
<div class="tags">
<a href="http://dict.site/happy.html" class="post-tag">happ</a>
<a href="http://dict.site/birthday.html" class="post-tag">birthday</a>
</div>
</div>
<div class="post-text">
<p>This is a Serious bug I found in phpsimpledom. This code is not working unless removing "{" form "best chinese dictionary site {recommended! #1}</p>
</div>
EOF;
$dom_html->load($get_dom_html, true, false);
if ( $dom_html ) {
// test ouput 1
echo $dom_html->find('.question-title', 0)->plaintext;
$tag_blocks = $dom_html->find('.tags', 0);
foreach ( $tag_blocks->find('a.post-tag') AS $tag )
{
// test ouput 2
echo $tag->plaintext . '<br />';
}
}
First $dom_html->find('.question-title', 0)->plaintext; will ouput html, not plain text
Second, $tag_blocks = $dom_html->find('.tags', 0); will show error Fatal error: Call to a member function find() on null
If I remove "{" or "}" ( one of them) form the exmaple then working.
Thanks for reporting this issue! You are right, this is a serious bug.
Unfortunately the parser currently removes Smarty scripts, which are non-HTML elements enclosed in curly braces (i.e.
{* Smarty Comment *}). Here is an example for a HTML document containing Smarty script (I know this is not realistic, but it serves it's purpose):If you check this document using the Markup Validation Service, it'll tell you that this is not valid HTML. The parser on the other hand will try to make this a valid HTML document by stripping anything between "{" and "}", which is the root cause for your issue.
The only solution to this is to skip removing Smarty scripts, which might break the parser for some users who depend on this behavior. Anyway, since Smarty scripts are supposed to be run server-side and not be returned to the user, this should only break for pages that currently don't work in any browser anyway.
If you need to remove Smarty scripts, you can do this in the calling function using
preg_replace("/(\{\w)(.*?)(\})/s", '', $doc);.Fixed via [eddccc]
Related
Commit: [eddccc]
Last edit: LogMANOriginal 2018-12-04