I tried to fix this, but that is really hard. I have to switch to the really old version again, although I started get used to the new selector types.
What I found so far:
1) in function seek text nodes should not be skipped:
// Skip if node isn't a child node (i.e. text nodes)
if($pass && $tag!='text' && !in_array($node, $node->parent->children, true)) {
unset($node);
continue;
}
2) In function "parse" and "as_text_node" the begin-cursor should be set:
$node = new simple_html_dom_node($this);
$node->_[HDOM_INFO_BEGIN] = $this->cursor++;
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Do you get any error message?
Which version of PHP do you use?
Does it work if you only comment out $dom->clear();?
This was reported multiple times, but I'm still unable to reproduce this error. Please don't hesitate to open another bug report and provide a test script with which it consistently fails for you. I'll be happy to look into it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
My previous version is "Version: 1.11 ($Rev: 175 $)", 2008 :-).
The problem with "return false" is: without that statement it always returns a (empty) dom. When my code does something like "dom->find" on false, it crashes (false->find). Return false makes sense, but it is not backward compatible to my version.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Version 1.11 (think of it as version 1.1.1) is much older than 1.5
The problem with "return false" is: without that statement it always returns a (empty) dom. When my code does something like "dom->find" on false, it crashes (false->find). Return false makes sense, but it is not backward compatible to my version.
I see your point. From what I can tell, str_get_html and file_get_html are supposed to work similar to file_get_contents, which also returns false if it can't load a file. That said, an empty string is valid HTML and should not result in false.
Maybe this can be changed in the future. I'm actually hoping to also get rid of most of the arguments as well, but that's a different topic.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried to fix this, but that is really hard. I have to switch to the really old version again, although I started get used to the new selector types.
What I found so far:
1) in function seek text nodes should not be skipped:
// Skip if node isn't a child node (i.e. text nodes)
if($pass && $tag!='text' && !in_array($node, $node->parent->children, true)) {
unset($node);
continue;
}
2) In function "parse" and "as_text_node" the begin-cursor should be set:
$node = new simple_html_dom_node($this);
$node->_[HDOM_INFO_BEGIN] = $this->cursor++;
Please also care to stay backwards compatible:
The default for $stripRN should be false for that
Also all my code crashed because of this, so I had to comment it out:
// if (empty($str) || strlen($str) > MAX_FILE_SIZE) {
// $dom->clear();
// return false;
// }
Thanks for spotting this issue. I'll make sure to add some tests for this.
Please check if this patch works for you:
What version did you use before? That flag has been true since its introduction in version 1.5 about 7 years ago. Am I missing something?
Do you get any error message?
Which version of PHP do you use?
Does it work if you only comment out
$dom->clear();?This was reported multiple times, but I'm still unable to reproduce this error. Please don't hesitate to open another bug report and provide a test script with which it consistently fails for you. I'll be happy to look into it.
Thanks for the fix, I'll try that.
My previous version is "Version: 1.11 ($Rev: 175 $)", 2008 :-).
The problem with "return false" is: without that statement it always returns a (empty) dom. When my code does something like "dom->find" on false, it crashes (false->find). Return false makes sense, but it is not backward compatible to my version.
Version 1.11 (think of it as version 1.1.1) is much older than 1.5
I see your point. From what I can tell,
str_get_htmlandfile_get_htmlare supposed to work similar tofile_get_contents, which also returnsfalseif it can't load a file. That said, an empty string is valid HTML and should not result infalse.Maybe this can be changed in the future. I'm actually hoping to also get rid of most of the arguments as well, but that's a different topic.
Your fix seems to work for me :-)
Thanks for the feedback. I'll update the code and release a fixed version (probably tomorrow).
Version 1.9.1 includes the fix for this bug. Find it at https://sourceforge.net/projects/simplehtmldom/files/simplehtmldom/1.9.1/