PHP Simple HTML DOM Parser Activity

A php based DOM parser.

Brought to you by: john_schlick, logmanoriginal, me578022

Activity for PHP Simple HTML DOM Parser

5 months ago
<REDACTED> posted a comment on ticket #193

DELETE THIS BUG REPORT
5 months ago
<REDACTED> posted a comment on ticket #199

DELETE THIS BUG REPORT
5 months ago
<REDACTED> posted a comment on ticket #201

DELETE THIS BUG REPORT
5 months ago
<REDACTED> posted a comment on ticket #203

DELETE THIS BUG REPORT
5 months ago
<REDACTED> posted a comment on ticket #204

DELETE THIS BUG REPORT
5 months ago
<REDACTED> posted a comment on ticket #46

DELETE THIS REQUEST
5 months ago
<REDACTED> posted a comment on ticket #63

DELETE THIS FEATURE REQUEST
5 months ago
<REDACTED> posted a comment on ticket #64

DELETE THIS FEATURE REQUEST
5 months ago
<REDACTED> posted a comment on ticket #65

DELETE THIS FEATURE REQUEST
5 months ago
<REDACTED> posted a comment on ticket #67

DELETE THIS FEATURE REQUEST
1 year ago
Hi Man O ManaO modified a comment on ticket #63

Ok. Wrong test. The method it's called like $sections = $html->find('section')->firstChild(); but I got the same an error because it's an Array. So there's not the same as the CSS pseudo selector rule :first-child How to use it to get the same result as CSS? TNX
1 year ago
Hi Man O ManaO modified a comment on ticket #63

Ok. Wrong test. The method it's called like $sections = $html->find('section')->firstChild(); but I got the same an error because it's an Array. So there's not the same as the CSS pseudo selector rule :first-child How to use it to get the same result as CSS? TNX
1 year ago
Hi Man O ManaO modified a comment on ticket #63

Ok. Wrong test. The method it's called like $sections = $html->find('section')->firstChild(); but I got the same an error because it's an Array. So there's not the same as the CSS pseudo selector rule :first-child How to use it to get the same result? TNX
1 year ago
Hi Man O ManaO posted a comment on ticket #63

Ok. Wrong test. The method it's called like $sections = $html->find('section')->firstChild(); but I got the same an error because it's an Array. So there's not the same as the CSS pseudo selector rule first-child
1 year ago
Hi Man O ManaO posted a comment on ticket #210

Sorry, wrong goal. Close this. The correct answer it's here: https://sourceforge.net/p/simplehtmldom/support-requests/63/
1 year ago
Hi Man O ManaO created ticket #63

Get the elements of the upper level like with CSS pseudo selector :first-child
1 year ago
Hi Man O ManaO created ticket #210

Find first child element like CSS does not respect order
1 year ago
Maxim Volobuev created ticket #209

Decoding HTML entities corrupts text in HTML
2 years ago
Igor Zhuravlov posted a comment on ticket #208

This bug persists even with well-formed HTML with single root element: <?php $s_htm = <<<EOT <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <body> <div class="c1"></div> <div class="c2"></div> </body> </html> EOT; ...
2 years ago
Igor Zhuravlov created ticket #208

$node->find() finds element next to $node
3 years ago
Kowsar Hossain posted a comment on ticket #207

I've also submitted a patch to address this issue here: https://sourceforge.net/p/simplehtmldom/feature-requests/68/
3 years ago
Kowsar Hossain created ticket #68

Added feature to enable/disable htmlentity operations
3 years ago
Kowsar Hossain created ticket #207

The output doesn't match the input even when the input hasn't been modified
3 years ago
DennisKmetz created ticket #62

PHP 8.x support
3 years ago
<REDACTED> created ticket #67

Get only text in leaf nodes (avoid duplication)
3 years ago
hkirsman created ticket #61

Is Github page safe to use for downloads?
3 years ago
<REDACTED> created ticket #205

End tags erroneously included in plaintext
3 years ago
Philip posted a comment on a blog post

There is a typo in the command, missing the "p" in the second "simple." shoulde be composer require simplehtmldom/simplehtmldom dev-master
3 years ago
Coz posted a comment on ticket #185

This is not a bug in simplehtmldom Yes it is. You're not setting a user agent in either the curl code or the stream_context code. Any properly configured server will reject the requests, which makes the project useless. You need to either add a generic user agent (recommend google bot) or provide a way for the user to pass in their own user agent to the function. See lines 72 and 111 of revised HtmlWeb.php file.
3 years ago
<REDACTED> posted a comment on ticket #203

Actually it has to be done something like this, because this function can be called from inside the library, and we want to get the first call that is outside the library. PS! Is the maintainer active these days? Has been quiet for a while. diff --git a/HtmlNode.php b/HtmlNode.php index 9649d37..99dbda4 100644 --- a/HtmlNode.php +++ b/HtmlNode.php @@ -549,3 +554,12 @@ class HtmlNode { - return $this->find($selector, $idx, $lowercase) ?: null; + if(!$element = $this->find($selector, $idx, $lowercase))...
3 years ago
<REDACTED> posted a comment on ticket #204

diff --git a/HtmlNode.php b/HtmlNode.php index 9649d37..aef8b17 100644 --- a/HtmlNode.php +++ b/HtmlNode.php @@ -549,3 +549,12 @@ class HtmlNode + function first($selector, $idx = 0, $lowercase = false) + { + return $this->expect($selector, $idx, $lowercase); + } Missed semicolon and preformatting.
3 years ago
<REDACTED> created ticket #204

Convenience function for getting first element
3 years ago
<REDACTED> posted a comment on ticket #201

diff --git a/simple_html_dom.php b/simple_html_dom.php index bce4d9e..97d6e1d 100644 --- a/simple_html_dom.php +++ b/simple_html_dom.php @@ -117,3 +117,3 @@ function file_get_html( $dom->clear(); - return false; + $contents = ""; } @@ -144,5 +144,4 @@ function str_get_html( $dom->clear(); - return false; + $contents = ""; } - return $dom->load($str, $lowercase, $stripRN); Better version with tabs. PS! The inline editor and preview function on the site seems to hide the first line of content :|
3 years ago
<REDACTED> created ticket #203

Always tell user where he expected non-existing element
3 years ago
Cees van Wageningen created ticket #202

Preg_match error occurs after saving new contentblock
3 years ago
<REDACTED> created ticket #201

Never return false on documents
4 years ago
Jim Longo posted a comment on ticket #60

I see. One would use $e->innertext to get the text inside the tag.
4 years ago
Jim Longo created ticket #60

No anchor text returned
4 years ago
LogMANOriginal modified ticket #200

"Creation of dynamic property" warning in PHP 8.2 (version 1.9.1)
4 years ago
LogMANOriginal posted a comment on ticket #200

Thanks for your bug report. This is actually a typo. The variable should be called $optional_closing_tags. There is a recent commit in master that illustrates the fix. This should also work in PHP 8.2 and higher. [8dc21bcb714c4edcb4318bdc3f198f4f78762381]
4 years ago
Jeffrey Kastner modified a comment on ticket #159

disregard
4 years ago
Jeffrey Kastner modified a comment on ticket #159
4 years ago
Jeffrey Kastner modified a comment on ticket #159

If I am understanding attribute selectors, this is not working again.. * ^ and $ all return 2 Example; echo count( str_get_html('<html><body><span class="first second">Hello!</span><span id="third">ME OH MI!</span></body></html>')->find('span[class^=second]') ); I have been trying to use attribute selectors to try and 'find' a div with an id with random numbers and -slideshow for the value (ex. 8099435804-slideshow) and I haven't been able to get it to work. ~in my case it returns all div's in the...
4 years ago
Jeffrey Kastner posted a comment on ticket #159

If I am understanding attribute selectors, this is not working again.. * ^ and $ all return 2 Example; echo count( str_get_html('<html><body><span class="first second">Hello!</span><span id="third">ME OH MI!</span></body></html>')->find('span[class^=second]') ); I have been trying to use attribute selectors to try and 'find' a div with an id with random numbers and -slideshow for the value (ex. 8099435804-slideshow) and I haven't been able to get it to work. ~in my case it returns all div's in the...
4 years ago
Bjørn Rosell created ticket #200

"Creation of dynamic property" warning in PHP 8.2 (version 1.9.1)
4 years ago
LogMANOriginal modified ticket #199

Incorrect handling of <br> tags next to line breaks
4 years ago
LogMANOriginal posted a comment on ticket #199

Looks good now! However, you must set the Unicode flag, or else preg_replace() may return an invalid Unicode string, which may cause the second preg_replace() to return NULL, and a deprecation error for the third preg_replace(). Good catch. Fixed via [b8d048e46b7f1964c28ea041d39ccb1d05f9a0ed]. And about the manual: I see now that the navigation sidebar is aligned far down upon page load, so that only the documentation for the functions (isset etc.) is immediately visible, not the more useful "Quick...
4 years ago
LogMANOriginal committed [b8d048]

HtmlNode: Replace and collapse unicode whitespace in plaintext
4 years ago
<REDACTED> modified a comment on ticket #199

Looks good now! However, you must set the Unicode flag, or else preg_replace() may return an invalid Unicode string, which may cause the second preg_replace() to return NULL, and a deprecation error for the third preg_replace(). diff --git a/HtmlNode.php b/HtmlNode.php index 9bc6a1a..9649d37 100644 --- a/HtmlNode.php +++ b/HtmlNode.php @@ -351 +351 @@ class HtmlNode - $ret = preg_replace('/\s+/', ' ', $ret); + $ret = preg_replace('/\s+/u', ' ', $ret); And about the manual: I see now that the navigation...
4 years ago
<REDACTED> posted a comment on ticket #199

Looks good now! However, you must set the Unicode flag here or else preg_replace() returns NULL for certain strings, which causes (deprecation) errors further down. diff --git a/HtmlNode.php b/HtmlNode.php index 9bc6a1a..9649d37 100644 --- a/HtmlNode.php +++ b/HtmlNode.php @@ -351 +351 @@ class HtmlNode - $ret = preg_replace('/\s+/', ' ', $ret); + $ret = preg_replace('/\s+/u', ' ', $ret); And about the manual: I see now that the navigation sidebar is aligned far down upon page load, so that only...
4 years ago
LogMANOriginal posted a comment on ticket #199

PS! Would be nice if you could link to the manual from the "Support" section, because it was hard to find. https://sourceforge.net/projects/simplehtmldom/support Turns out that page is managed by SF. There is no way to change the contents of that page 😔 I added a "Manual" tab instead.
4 years ago
LogMANOriginal posted a comment on ticket #199

PS! Would be nice if you could link to the manual from the "Support" section, because it was hard to find. https://sourceforge.net/projects/simplehtmldom/support Good idea! I'll do that. The space thing works now, but the BR tag is still not handled well. Try the code in the original post and compare the output when (un)-commenting the commented line. I'm comparing the output of plaintext with what is displayed in the browser and it looks exactly the same. Please note that I have removed wordwrap()...
4 years ago
<REDACTED> posted a comment on ticket #199

The space thing works now, but the BR tag is still not handled well. Try the code in the original post and compare the output when (un)-commenting the commented line. PS! Would be nice if you could link to the manual from the "Support" section, because it was hard to find. https://sourceforge.net/projects/simplehtmldom/support
4 years ago
LogMANOriginal posted a comment on ticket #199

[67c0f4e21091a9cc66151610a653724a0acb1b69] fixes the whitespace issue. Let me know if this works for you.
4 years ago
LogMANOriginal committed [67c0f4]

HtmlNode: Replace and collapse whitespace in plaintext
4 years ago
LogMANOriginal posted a comment on ticket #199

Shouldn't plaintext convert newlines to spaces? Did you change this recently? Surely this is a bug/regression, or am I missing something completely? The plaintext implementation is completely rewritten but it passes all tests. Your particular case probably isn't covered by any of the tests right now. I'll check this as well. At the very least <br> seems to work right. PS! Where on the SourceForge page is the link to the manual (the one with the clickable tabs with examples, etc.)? I hope you didn't...
4 years ago
<REDACTED> posted a comment on ticket #199

Shouldn't plaintext convert newlines to spaces? Did you change this recently? Surely this is a bug/regression, or am I missing something completely? $text = "<p>Hello" . "\n" . "World</p>"; $plain = str_get_html($text)->plaintext; echo "PLAINTEXT:\n" . $plain . "\n\n"; echo "WORDWRAP:\n" . wordwrap($plain, 80) . "\n"; PS! Where on the SourceForge page is the link to the manual (the one with the clickable tabs with examples, etc.)? I hope you didn't remove this, because I use it as a reference all...
4 years ago
LogMANOriginal posted a comment on ticket #199

Please try again with current master. From what I can tell, the output looks right: ***** ** ********,*** ****** ********. ******* **** *** ** ** ******* *** *** ****** ***. *** **** *** ****** ****. *.***. ** **** *** ***** **** ***** ********* *** ************ ** *** *** *** *** ** ****. *** ** ****, ******** ******* *** ******** ********** * ** ******* ****. ******** ** *** *** **** ***********, ** *** *** *** * *** *** ********* .*** ***** ******* *** ** **** ***.*** ****** *** ** **** *** ********...
4 years ago
LogMANOriginal modified ticket #198

iconv() detected an illegal character in input string
4 years ago
LogMANOriginal posted a comment on ticket #198

This is fixed via [c53a612e6fe61d5b1efc0c3270e20aa34e4e84ee]. Instead of using //IGNORE, it needs to be wrapped inside a try-catch block, so that the character set is detected properly. Eventually, this will be replaced by a better solution, but this works for now. Thanks again for reporting!
4 years ago
LogMANOriginal committed [c53a61]

HtmlDocument: Use try-catch block for iconv
4 years ago
LogMANOriginal committed [d553de]

HtmlNode: Fix empty if-statement
4 years ago
LogMANOriginal committed [7a5b98]

docs: Include recent changes
4 years ago
LogMANOriginal committed [025297]

HtmlDocument: Let the parser decode entities
4 years ago
LogMANOriginal committed [cc1063]

HtmlDocument: Inline token_equal, _slash, and _attr
4 years ago
LogMANOriginal committed [ad6686]

HtmlDocument: Don't use magic functions
4 years ago
LogMANOriginal committed [133547]

HtmlNode: Stop removing UTF-8 BOM from the end of a string
4 years ago
LogMANOriginal committed [8dc21b]

HtmlDocument: Fix broken $forceTagsClosed = false
4 years ago
LogMANOriginal committed [f658bc]

HtmlDocument: Use shortcuts for seek methods
4 years ago
LogMANOriginal committed [34743a]

HtmlDocument: Inline skip method
4 years ago
LogMANOriginal committed [718b90]

HtmlDocument: Add shortcuts for the parser
4 years ago
LogMANOriginal committed [2b4971]

HtmlDocument: Use native functions for tag names and attribute values
4 years ago
LogMANOriginal committed [101a85]

Fix memory parsing test
4 years ago
LogMANOriginal committed [d573cd]

HtmlDocument: Don't remove noise before parsing.
4 years ago
LogMANOriginal committed [88c67b]

HtmlDocument: Don't assign nodes by reference
4 years ago
Roland Heymanns posted a comment on ticket #198

Thanks for your good work! The error message first appeared after I upgraded PHP from 8.0 to 8.1 last week.
4 years ago
LogMANOriginal modified ticket #199

Incorrect handling of <br> tags next to line breaks
4 years ago
LogMANOriginal posted a comment on ticket #199

Thanks for reporting. I fixed your original message. You are right, the current implementation of <br> is wrong. I haven't tested this yet but it should give slightly better results if you define DEFAULT_BR_TEXT like this: define("DEFAULT_BR_TEXT", PHP_EOL) At the very least, this makes it platform independent. That said, there is additional work to do in the parser to handle all cases (like the <br> a case).
4 years ago
LogMANOriginal modified ticket #198

iconv() detected an illegal character in input string
4 years ago
LogMANOriginal posted a comment on ticket #198

Thanks for reporting. It took me a while to figure out what is going on. Am I right to assume that you are running on PHP 8.x? In previous versions that error would not have been reported because of the error suppression operator (@). (Un-)fortunately the behavior of this operator changed in PHP 8: https://php.watch/versions/8.0/fatal-error-suppression The behavior for //IGNORE depends on the specific implementation of iconv, some of which completely ignore this flag. Still, this is a good hack to...
4 years ago
<REDACTED> posted a comment on ticket #199

The text should say "The BR tag is not...". Evidently this editor interprets HTML tags as-is, and initial postings can't be edited :/
4 years ago
<REDACTED> created ticket #199

Incorrect handling of <br> tags next to line breaks
4 years ago
Roland Heymanns created ticket #198

iconv() detected an illegal character in input string
4 years ago
LogMANOriginal modified ticket #50

PHP 7 .x compatibility
4 years ago
LogMANOriginal modified ticket #60

parsing stops after first multibyte character
4 years ago
LogMANOriginal modified ticket #64

Comments on MAX_FILE_SIZE
4 years ago
LogMANOriginal modified ticket #65

Notify when zero elements were found
4 years ago
LogMANOriginal modified ticket #66

Role attribute
4 years ago
LogMANOriginal modified ticket #59

Slashdot example updated
4 years ago
LogMANOriginal posted a comment on ticket #59

Thanks for the feedback! The example in 1.9 is probably not functional anymore, but there is an updated version in the current master that still works. Here is the link for future reference: https://sourceforge.net/p/simplehtmldom/repository/ci/master/tree/example/scraping/example_scraping_slashdot.php
4 years ago
LogMANOriginal posted a comment on ticket #58

That choice is entirely up to you.
4 years ago
LogMANOriginal modified ticket #58

Is this project active anymore?
4 years ago
LogMANOriginal modified ticket #56

How to avoid break on 404 errors?
4 years ago
LogMANOriginal posted a comment on ticket #56

Good to know you found a solution :)
4 years ago
LogMANOriginal modified ticket #54

Traversing the Dom within a series of columns
4 years ago
LogMANOriginal posted a comment on ticket #54

You probably figured it out in the mean time, but here is a complete example that will give you what you want. <?php include_once 'simple_html_dom.php'; $doc = <<<EOD <tr> <td></td> <td id="column2" class="style3">A</td> <td id="column2" class="style2">B</td> <td> <a href="#link")>Description of Link</a> </td> </tr> EOD; $html = str_get_html($doc); $href = $html->find('a', 0)->href; $description = $html->find('a', 0)->innertext; echo $href . PHP_EOL . $description . PHP_EOL; // #link // Description...
4 years ago
LogMANOriginal posted a comment on ticket #53

This is probably no longer relevant but the for loop in your example indexes over the value of the first element instead of all script elements. foreach($items->find('script',0) as $e) { $e->outertext = ''; echo '$e: ' . $e . '<br/>'; } Notice the ,0 in ->find('script',0). This is why the error occurs. Here is the correct version: foreach($items->find('script') as $e) { $e->outertext = ''; echo '$e: ' . $e . '<br/>'; }
4 years ago
LogMANOriginal modified ticket #53

Removing tags does not work
4 years ago
LogMANOriginal modified ticket #52

Timezone change

1 >

PHP Simple HTML DOM Parser Activity

A php based DOM parser.

Activity for PHP Simple HTML DOM Parser

<REDACTED> posted a comment on ticket #193

<REDACTED> posted a comment on ticket #199

<REDACTED> posted a comment on ticket #201

<REDACTED> posted a comment on ticket #203

<REDACTED> posted a comment on ticket #204

<REDACTED> posted a comment on ticket #46

<REDACTED> posted a comment on ticket #63

<REDACTED> posted a comment on ticket #64

<REDACTED> posted a comment on ticket #65

<REDACTED> posted a comment on ticket #67

Hi Man O ManaO modified a comment on ticket #63

Hi Man O ManaO modified a comment on ticket #63

Hi Man O ManaO modified a comment on ticket #63

Hi Man O ManaO posted a comment on ticket #63

Hi Man O ManaO posted a comment on ticket #210

Hi Man O ManaO created ticket #63

Hi Man O ManaO created ticket #210

Maxim Volobuev created ticket #209

Igor Zhuravlov posted a comment on ticket #208

Igor Zhuravlov created ticket #208

Kowsar Hossain posted a comment on ticket #207

Kowsar Hossain created ticket #68

Kowsar Hossain created ticket #207

DennisKmetz created ticket #62

<REDACTED> created ticket #67

hkirsman created ticket #61

<REDACTED> created ticket #205

Philip posted a comment on a blog post

Coz posted a comment on ticket #185

<REDACTED> posted a comment on ticket #203

<REDACTED> posted a comment on ticket #204

<REDACTED> created ticket #204

<REDACTED> posted a comment on ticket #201

<REDACTED> created ticket #203

Cees van Wageningen created ticket #202

<REDACTED> created ticket #201

Jim Longo posted a comment on ticket #60

Jim Longo created ticket #60

LogMANOriginal modified ticket #200

LogMANOriginal posted a comment on ticket #200

Jeffrey Kastner modified a comment on ticket #159

Jeffrey Kastner modified a comment on ticket #159

Jeffrey Kastner modified a comment on ticket #159

Jeffrey Kastner posted a comment on ticket #159

Bjørn Rosell created ticket #200

LogMANOriginal modified ticket #199

LogMANOriginal posted a comment on ticket #199

LogMANOriginal committed [b8d048]

<REDACTED> modified a comment on ticket #199

<REDACTED> posted a comment on ticket #199

LogMANOriginal posted a comment on ticket #199

LogMANOriginal posted a comment on ticket #199

<REDACTED> posted a comment on ticket #199

LogMANOriginal posted a comment on ticket #199

LogMANOriginal committed [67c0f4]

LogMANOriginal posted a comment on ticket #199

<REDACTED> posted a comment on ticket #199

LogMANOriginal posted a comment on ticket #199

LogMANOriginal modified ticket #198

LogMANOriginal posted a comment on ticket #198

LogMANOriginal committed [c53a61]

LogMANOriginal committed [d553de]

LogMANOriginal committed [7a5b98]

LogMANOriginal committed [025297]

LogMANOriginal committed [cc1063]

LogMANOriginal committed [ad6686]

LogMANOriginal committed [133547]

LogMANOriginal committed [8dc21b]

LogMANOriginal committed [f658bc]

LogMANOriginal committed [34743a]

LogMANOriginal committed [718b90]

LogMANOriginal committed [2b4971]

LogMANOriginal committed [101a85]

LogMANOriginal committed [d573cd]

LogMANOriginal committed [88c67b]

Roland Heymanns posted a comment on ticket #198

LogMANOriginal modified ticket #199