Menu

#175 find('text') does not work anymore in new versions

closed
find (1) text (1)
2019-10-20
2019-10-19
cyberbeat
No

Example:
aaaa
bbbb
cccc

$example->find('text',0) should find 'aaaa'
$example->find('text',1) should find 'bbbb'
$example->find('text',2) should find 'cccc'

like in previous versions.

Discussion

  • cyberbeat

    cyberbeat - 2019-10-19

    I tried to fix this, but that is really hard. I have to switch to the really old version again, although I started get used to the new selector types.

    What I found so far:
    1) in function seek text nodes should not be skipped:
    // Skip if node isn't a child node (i.e. text nodes)
    if($pass && $tag!='text' && !in_array($node, $node->parent->children, true)) {
    unset($node);
    continue;
    }

    2) In function "parse" and "as_text_node" the begin-cursor should be set:
    $node = new simple_html_dom_node($this);
    $node->_[HDOM_INFO_BEGIN] = $this->cursor++;

     
  • cyberbeat

    cyberbeat - 2019-10-19

    Please also care to stay backwards compatible:

    The default for $stripRN should be false for that

    Also all my code crashed because of this, so I had to comment it out:

    // if (empty($str) || strlen($str) > MAX_FILE_SIZE) {
    // $dom->clear();
    // return false;
    // }

     
  • LogMANOriginal

    LogMANOriginal - 2019-10-19

    Thanks for spotting this issue. I'll make sure to add some tests for this.
    Please check if this patch works for you:

    diff --git a/simple_html_dom.php b/simple_html_dom.php
    index 8a2bb71..a235a71 100644
    --- a/simple_html_dom.php
    +++ b/simple_html_dom.php
    @@ -609,6 +609,12 @@ class simple_html_dom_node
                    $pass = false;
                }
    
    
    +           if($pass && $tag === 'text' && $node->tag === 'text') {
    +               $ret[array_search($node, $this->dom->nodes, true)] = 1;
    +               unset($node);
    +               continue;
    +           }
    +
                // Skip if node isn't a child node (i.e. text nodes)
                if($pass && !in_array($node, $node->parent->children, true)) {
                    $pass = false;
    

    Please also care to stay backwards compatible:

    The default for $stripRN should be false for that

    What version did you use before? That flag has been true since its introduction in version 1.5 about 7 years ago. Am I missing something?

    Also all my code crashed because of this, so I had to comment it out:

    // if (empty($str) || strlen($str) > MAX_FILE_SIZE) {
    // $dom->clear();
    // return false;
    // }

    Do you get any error message?
    Which version of PHP do you use?
    Does it work if you only comment out $dom->clear();?

    This was reported multiple times, but I'm still unable to reproduce this error. Please don't hesitate to open another bug report and provide a test script with which it consistently fails for you. I'll be happy to look into it.

     
  • cyberbeat

    cyberbeat - 2019-10-19

    Thanks for the fix, I'll try that.

    My previous version is "Version: 1.11 ($Rev: 175 $)", 2008 :-).

    The problem with "return false" is: without that statement it always returns a (empty) dom. When my code does something like "dom->find" on false, it crashes (false->find). Return false makes sense, but it is not backward compatible to my version.

     
    • LogMANOriginal

      LogMANOriginal - 2019-10-19

      Version 1.11 (think of it as version 1.1.1) is much older than 1.5

      The problem with "return false" is: without that statement it always returns a (empty) dom. When my code does something like "dom->find" on false, it crashes (false->find). Return false makes sense, but it is not backward compatible to my version.

      I see your point. From what I can tell, str_get_html and file_get_html are supposed to work similar to file_get_contents, which also returns false if it can't load a file. That said, an empty string is valid HTML and should not result in false.

      Maybe this can be changed in the future. I'm actually hoping to also get rid of most of the arguments as well, but that's a different topic.

       
  • cyberbeat

    cyberbeat - 2019-10-19

    Your fix seems to work for me :-)

     
    • LogMANOriginal

      LogMANOriginal - 2019-10-19

      Thanks for the feedback. I'll update the code and release a fixed version (probably tomorrow).

       
  • LogMANOriginal

    LogMANOriginal - 2019-10-20
    • status: open --> closed
    • assigned_to: LogMANOriginal
     

Log in to post a comment.

MongoDB Logo MongoDB