Menu

#154 Serious bug:

closed
None
2018-12-06
2016-02-22
peter
No

This code is not working

$get_dom_html = <<<'EOF'
<div class="question-title">
    <h2>
        <a href="http://dict.site" class="hyperlink">best chinese dictionary site {recommended! #1</a>
    </h2>
    <div class="tags">
        <a href="http://dict.site/happy.html" class="post-tag">happ</a>
        <a href="http://dict.site/birthday.html" class="post-tag">birthday</a>
    </div>
</div>
<div class="post-text">
    <p>This is a Serious bug I found in phpsimpledom. This code is not working unless removing "{" form "best chinese dictionary site {recommended! #1}</p>
</div>
EOF;

$dom_html->load($get_dom_html, true, false);

if ( $dom_html ) {

    // test ouput 1
    echo $dom_html->find('.question-title', 0)->plaintext;

    $tag_blocks = $dom_html->find('.tags', 0);

    foreach ( $tag_blocks->find('a.post-tag') AS $tag )
    {
          // test ouput 2
        echo $tag->plaintext . '<br />';
    }
}

First $dom_html->find('.question-title', 0)->plaintext; will ouput html, not plain text
Second, $tag_blocks = $dom_html->find('.tags', 0); will show error Fatal error: Call to a member function find() on null

If I remove "{" or "}" ( one of them) form the exmaple then working.

Discussion

  • LogMANOriginal

    LogMANOriginal - 2018-12-04

    Thanks for reporting this issue! You are right, this is a serious bug.

    Unfortunately the parser currently removes Smarty scripts, which are non-HTML elements enclosed in curly braces (i.e. {* Smarty Comment *}). Here is an example for a HTML document containing Smarty script (I know this is not realistic, but it serves it's purpose):

    <!DOCTYPE html>
    <title>Test</title>
    <p>
    {$p="<p></p>"}
    </p>
    

    If you check this document using the Markup Validation Service, it'll tell you that this is not valid HTML. The parser on the other hand will try to make this a valid HTML document by stripping anything between "{" and "}", which is the root cause for your issue.

    The only solution to this is to skip removing Smarty scripts, which might break the parser for some users who depend on this behavior. Anyway, since Smarty scripts are supposed to be run server-side and not be returned to the user, this should only break for pages that currently don't work in any browser anyway.

    If you need to remove Smarty scripts, you can do this in the calling function using preg_replace("/(\{\w)(.*?)(\})/s", '', $doc);.

    Fixed via [eddccc]

     

    Related

    Commit: [eddccc]


    Last edit: LogMANOriginal 2018-12-04
  • LogMANOriginal

    LogMANOriginal - 2018-12-04
    • status: open --> closed
    • assigned_to: LogMANOriginal
     

Log in to post a comment.

MongoDB Logo MongoDB