Activity for PHP Simple HTML DOM Parser

  • <REDACTED> posted a comment on ticket #193

    DELETE THIS BUG REPORT

  • <REDACTED> posted a comment on ticket #199

    DELETE THIS BUG REPORT

  • <REDACTED> posted a comment on ticket #201

    DELETE THIS BUG REPORT

  • <REDACTED> posted a comment on ticket #203

    DELETE THIS BUG REPORT

  • <REDACTED> posted a comment on ticket #204

    DELETE THIS BUG REPORT

  • <REDACTED> posted a comment on ticket #46

    DELETE THIS REQUEST

  • <REDACTED> posted a comment on ticket #63

    DELETE THIS FEATURE REQUEST

  • <REDACTED> posted a comment on ticket #64

    DELETE THIS FEATURE REQUEST

  • <REDACTED> posted a comment on ticket #65

    DELETE THIS FEATURE REQUEST

  • <REDACTED> posted a comment on ticket #67

    DELETE THIS FEATURE REQUEST

  • Hi Man O ManaO Hi Man O ManaO modified a comment on ticket #63

    Ok. Wrong test. The method it's called like $sections = $html->find('section')->firstChild(); but I got the same an error because it's an Array. So there's not the same as the CSS pseudo selector rule :first-child How to use it to get the same result as CSS? TNX

  • Hi Man O ManaO Hi Man O ManaO modified a comment on ticket #63

    Ok. Wrong test. The method it's called like $sections = $html->find('section')->firstChild(); but I got the same an error because it's an Array. So there's not the same as the CSS pseudo selector rule :first-child How to use it to get the same result as CSS? TNX

  • Hi Man O ManaO Hi Man O ManaO modified a comment on ticket #63

    Ok. Wrong test. The method it's called like $sections = $html->find('section')->firstChild(); but I got the same an error because it's an Array. So there's not the same as the CSS pseudo selector rule :first-child How to use it to get the same result? TNX

  • Hi Man O ManaO Hi Man O ManaO posted a comment on ticket #63

    Ok. Wrong test. The method it's called like $sections = $html->find('section')->firstChild(); but I got the same an error because it's an Array. So there's not the same as the CSS pseudo selector rule first-child

  • Hi Man O ManaO Hi Man O ManaO posted a comment on ticket #210

    Sorry, wrong goal. Close this. The correct answer it's here: https://sourceforge.net/p/simplehtmldom/support-requests/63/

  • Hi Man O ManaO Hi Man O ManaO created ticket #63

    Get the elements of the upper level like with CSS pseudo selector :first-child

  • Hi Man O ManaO Hi Man O ManaO created ticket #210

    Find first child element like CSS does not respect order

  • Maxim Volobuev Maxim Volobuev created ticket #209

    Decoding HTML entities corrupts text in HTML

  • Igor Zhuravlov Igor Zhuravlov posted a comment on ticket #208

    This bug persists even with well-formed HTML with single root element: <?php $s_htm = <<<EOT <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <body> <div class="c1"></div> <div class="c2"></div> </body> </html> EOT; ...

  • Igor Zhuravlov Igor Zhuravlov created ticket #208

    $node->find() finds element next to $node

  • Kowsar Hossain Kowsar Hossain posted a comment on ticket #207

    I've also submitted a patch to address this issue here: https://sourceforge.net/p/simplehtmldom/feature-requests/68/

  • Kowsar Hossain Kowsar Hossain created ticket #68

    Added feature to enable/disable htmlentity operations

  • Kowsar Hossain Kowsar Hossain created ticket #207

    The output doesn't match the input even when the input hasn't been modified

  • DennisKmetz DennisKmetz created ticket #62

    PHP 8.x support

  • <REDACTED> created ticket #67

    Get only text in leaf nodes (avoid duplication)

  • hkirsman hkirsman created ticket #61

    Is Github page safe to use for downloads?

  • <REDACTED> created ticket #205

    End tags erroneously included in plaintext

  • Philip Philip posted a comment on a blog post

    There is a typo in the command, missing the "p" in the second "simple." shoulde be composer require simplehtmldom/simplehtmldom dev-master

  • Coz Coz posted a comment on ticket #185

    This is not a bug in simplehtmldom Yes it is. You're not setting a user agent in either the curl code or the stream_context code. Any properly configured server will reject the requests, which makes the project useless. You need to either add a generic user agent (recommend google bot) or provide a way for the user to pass in their own user agent to the function. See lines 72 and 111 of revised HtmlWeb.php file.

  • <REDACTED> posted a comment on ticket #203

    Actually it has to be done something like this, because this function can be called from inside the library, and we want to get the first call that is outside the library. PS! Is the maintainer active these days? Has been quiet for a while. diff --git a/HtmlNode.php b/HtmlNode.php index 9649d37..99dbda4 100644 --- a/HtmlNode.php +++ b/HtmlNode.php @@ -549,3 +554,12 @@ class HtmlNode { - return $this->find($selector, $idx, $lowercase) ?: null; + if(!$element = $this->find($selector, $idx, $lowercase))...

  • <REDACTED> posted a comment on ticket #204

    diff --git a/HtmlNode.php b/HtmlNode.php index 9649d37..aef8b17 100644 --- a/HtmlNode.php +++ b/HtmlNode.php @@ -549,3 +549,12 @@ class HtmlNode + function first($selector, $idx = 0, $lowercase = false) + { + return $this->expect($selector, $idx, $lowercase); + } Missed semicolon and preformatting.

  • <REDACTED> created ticket #204

    Convenience function for getting first element

  • <REDACTED> posted a comment on ticket #201

    diff --git a/simple_html_dom.php b/simple_html_dom.php index bce4d9e..97d6e1d 100644 --- a/simple_html_dom.php +++ b/simple_html_dom.php @@ -117,3 +117,3 @@ function file_get_html( $dom->clear(); - return false; + $contents = ""; } @@ -144,5 +144,4 @@ function str_get_html( $dom->clear(); - return false; + $contents = ""; } - return $dom->load($str, $lowercase, $stripRN); Better version with tabs. PS! The inline editor and preview function on the site seems to hide the first line of content :|

  • <REDACTED> created ticket #203

    Always tell user where he expected non-existing element

  • Cees van Wageningen Cees van Wageningen created ticket #202

    Preg_match error occurs after saving new contentblock

  • <REDACTED> created ticket #201

    Never return false on documents

  • Jim Longo Jim Longo posted a comment on ticket #60

    I see. One would use $e->innertext to get the text inside the tag.

  • Jim Longo Jim Longo created ticket #60

    No anchor text returned

  • LogMANOriginal LogMANOriginal modified ticket #200

    "Creation of dynamic property" warning in PHP 8.2 (version 1.9.1)

  • LogMANOriginal LogMANOriginal posted a comment on ticket #200

    Thanks for your bug report. This is actually a typo. The variable should be called $optional_closing_tags. There is a recent commit in master that illustrates the fix. This should also work in PHP 8.2 and higher. [8dc21bcb714c4edcb4318bdc3f198f4f78762381]

  • Jeffrey Kastner Jeffrey Kastner modified a comment on ticket #159

    disregard

  • Jeffrey Kastner Jeffrey Kastner modified a comment on ticket #159

  • Jeffrey Kastner Jeffrey Kastner modified a comment on ticket #159

    If I am understanding attribute selectors, this is not working again.. * ^ and $ all return 2 Example; echo count( str_get_html('<html><body><span class="first second">Hello!</span><span id="third">ME OH MI!</span></body></html>')->find('span[class^=second]') ); I have been trying to use attribute selectors to try and 'find' a div with an id with random numbers and -slideshow for the value (ex. 8099435804-slideshow) and I haven't been able to get it to work. ~in my case it returns all div's in the...

  • Jeffrey Kastner Jeffrey Kastner posted a comment on ticket #159

    If I am understanding attribute selectors, this is not working again.. * ^ and $ all return 2 Example; echo count( str_get_html('<html><body><span class="first second">Hello!</span><span id="third">ME OH MI!</span></body></html>')->find('span[class^=second]') ); I have been trying to use attribute selectors to try and 'find' a div with an id with random numbers and -slideshow for the value (ex. 8099435804-slideshow) and I haven't been able to get it to work. ~in my case it returns all div's in the...

  • Bjørn Rosell Bjørn Rosell created ticket #200

    "Creation of dynamic property" warning in PHP 8.2 (version 1.9.1)

  • LogMANOriginal LogMANOriginal modified ticket #199

    Incorrect handling of <br> tags next to line breaks

  • LogMANOriginal LogMANOriginal posted a comment on ticket #199

    Looks good now! However, you must set the Unicode flag, or else preg_replace() may return an invalid Unicode string, which may cause the second preg_replace() to return NULL, and a deprecation error for the third preg_replace(). Good catch. Fixed via [b8d048e46b7f1964c28ea041d39ccb1d05f9a0ed]. And about the manual: I see now that the navigation sidebar is aligned far down upon page load, so that only the documentation for the functions (isset etc.) is immediately visible, not the more useful "Quick...

  • LogMANOriginal LogMANOriginal committed [b8d048]

    HtmlNode: Replace and collapse unicode whitespace in plaintext

  • <REDACTED> modified a comment on ticket #199

    Looks good now! However, you must set the Unicode flag, or else preg_replace() may return an invalid Unicode string, which may cause the second preg_replace() to return NULL, and a deprecation error for the third preg_replace(). diff --git a/HtmlNode.php b/HtmlNode.php index 9bc6a1a..9649d37 100644 --- a/HtmlNode.php +++ b/HtmlNode.php @@ -351 +351 @@ class HtmlNode - $ret = preg_replace('/\s+/', ' ', $ret); + $ret = preg_replace('/\s+/u', ' ', $ret); And about the manual: I see now that the navigation...

  • <REDACTED> posted a comment on ticket #199

    Looks good now! However, you must set the Unicode flag here or else preg_replace() returns NULL for certain strings, which causes (deprecation) errors further down. diff --git a/HtmlNode.php b/HtmlNode.php index 9bc6a1a..9649d37 100644 --- a/HtmlNode.php +++ b/HtmlNode.php @@ -351 +351 @@ class HtmlNode - $ret = preg_replace('/\s+/', ' ', $ret); + $ret = preg_replace('/\s+/u', ' ', $ret); And about the manual: I see now that the navigation sidebar is aligned far down upon page load, so that only...

  • LogMANOriginal LogMANOriginal posted a comment on ticket #199

    PS! Would be nice if you could link to the manual from the "Support" section, because it was hard to find. https://sourceforge.net/projects/simplehtmldom/support Turns out that page is managed by SF. There is no way to change the contents of that page 😔 I added a "Manual" tab instead.

  • LogMANOriginal LogMANOriginal posted a comment on ticket #199

    PS! Would be nice if you could link to the manual from the "Support" section, because it was hard to find. https://sourceforge.net/projects/simplehtmldom/support Good idea! I'll do that. The space thing works now, but the BR tag is still not handled well. Try the code in the original post and compare the output when (un)-commenting the commented line. I'm comparing the output of plaintext with what is displayed in the browser and it looks exactly the same. Please note that I have removed wordwrap()...

  • <REDACTED> posted a comment on ticket #199

    The space thing works now, but the BR tag is still not handled well. Try the code in the original post and compare the output when (un)-commenting the commented line. PS! Would be nice if you could link to the manual from the "Support" section, because it was hard to find. https://sourceforge.net/projects/simplehtmldom/support

  • LogMANOriginal LogMANOriginal posted a comment on ticket #199

    [67c0f4e21091a9cc66151610a653724a0acb1b69] fixes the whitespace issue. Let me know if this works for you.

  • LogMANOriginal LogMANOriginal committed [67c0f4]

    HtmlNode: Replace and collapse whitespace in plaintext

  • LogMANOriginal LogMANOriginal posted a comment on ticket #199

    Shouldn't plaintext convert newlines to spaces? Did you change this recently? Surely this is a bug/regression, or am I missing something completely? The plaintext implementation is completely rewritten but it passes all tests. Your particular case probably isn't covered by any of the tests right now. I'll check this as well. At the very least <br> seems to work right. PS! Where on the SourceForge page is the link to the manual (the one with the clickable tabs with examples, etc.)? I hope you didn't...

  • <REDACTED> posted a comment on ticket #199

    Shouldn't plaintext convert newlines to spaces? Did you change this recently? Surely this is a bug/regression, or am I missing something completely? $text = "<p>Hello" . "\n" . "World</p>"; $plain = str_get_html($text)->plaintext; echo "PLAINTEXT:\n" . $plain . "\n\n"; echo "WORDWRAP:\n" . wordwrap($plain, 80) . "\n"; PS! Where on the SourceForge page is the link to the manual (the one with the clickable tabs with examples, etc.)? I hope you didn't remove this, because I use it as a reference all...

  • LogMANOriginal LogMANOriginal posted a comment on ticket #199

    Please try again with current master. From what I can tell, the output looks right: ***** ** ********,*** ****** ********. ******* **** *** ** ** ******* *** *** ****** ***. *** **** *** ****** ****. *.***. ** **** *** ***** **** ***** ********* *** ************ ** *** *** *** *** ** ****. *** ** ****, ******** ******* *** ******** ********** * ** ******* ****. ******** ** *** *** **** ***********, ** *** *** *** * *** *** ********* .*** ***** ******* *** ** **** ***.*** ****** *** ** **** *** ********...

  • LogMANOriginal LogMANOriginal modified ticket #198

    iconv() detected an illegal character in input string

  • LogMANOriginal LogMANOriginal posted a comment on ticket #198

    This is fixed via [c53a612e6fe61d5b1efc0c3270e20aa34e4e84ee]. Instead of using //IGNORE, it needs to be wrapped inside a try-catch block, so that the character set is detected properly. Eventually, this will be replaced by a better solution, but this works for now. Thanks again for reporting!

  • LogMANOriginal LogMANOriginal committed [c53a61]

    HtmlDocument: Use try-catch block for iconv

  • LogMANOriginal LogMANOriginal committed [d553de]

    HtmlNode: Fix empty if-statement

  • LogMANOriginal LogMANOriginal committed [7a5b98]

    docs: Include recent changes

  • LogMANOriginal LogMANOriginal committed [025297]

    HtmlDocument: Let the parser decode entities

  • LogMANOriginal LogMANOriginal committed [cc1063]

    HtmlDocument: Inline token_equal, _slash, and _attr

  • LogMANOriginal LogMANOriginal committed [ad6686]

    HtmlDocument: Don't use magic functions

  • LogMANOriginal LogMANOriginal committed [133547]

    HtmlNode: Stop removing UTF-8 BOM from the end of a string

  • LogMANOriginal LogMANOriginal committed [8dc21b]

    HtmlDocument: Fix broken $forceTagsClosed = false

  • LogMANOriginal LogMANOriginal committed [f658bc]

    HtmlDocument: Use shortcuts for seek methods

  • LogMANOriginal LogMANOriginal committed [34743a]

    HtmlDocument: Inline skip method

  • LogMANOriginal LogMANOriginal committed [718b90]

    HtmlDocument: Add shortcuts for the parser

  • LogMANOriginal LogMANOriginal committed [2b4971]

    HtmlDocument: Use native functions for tag names and attribute values

  • LogMANOriginal LogMANOriginal committed [101a85]

    Fix memory parsing test

  • LogMANOriginal LogMANOriginal committed [d573cd]

    HtmlDocument: Don't remove noise before parsing.

  • LogMANOriginal LogMANOriginal committed [88c67b]

    HtmlDocument: Don't assign nodes by reference

  • Roland Heymanns Roland Heymanns posted a comment on ticket #198

    Thanks for your good work! The error message first appeared after I upgraded PHP from 8.0 to 8.1 last week.

  • LogMANOriginal LogMANOriginal modified ticket #199

    Incorrect handling of <br> tags next to line breaks

  • LogMANOriginal LogMANOriginal posted a comment on ticket #199

    Thanks for reporting. I fixed your original message. You are right, the current implementation of <br> is wrong. I haven't tested this yet but it should give slightly better results if you define DEFAULT_BR_TEXT like this: define("DEFAULT_BR_TEXT", PHP_EOL) At the very least, this makes it platform independent. That said, there is additional work to do in the parser to handle all cases (like the <br> a case).

  • LogMANOriginal LogMANOriginal modified ticket #198

    iconv() detected an illegal character in input string

  • LogMANOriginal LogMANOriginal posted a comment on ticket #198

    Thanks for reporting. It took me a while to figure out what is going on. Am I right to assume that you are running on PHP 8.x? In previous versions that error would not have been reported because of the error suppression operator (@). (Un-)fortunately the behavior of this operator changed in PHP 8: https://php.watch/versions/8.0/fatal-error-suppression The behavior for //IGNORE depends on the specific implementation of iconv, some of which completely ignore this flag. Still, this is a good hack to...

  • <REDACTED> posted a comment on ticket #199

    The text should say "The BR tag is not...". Evidently this editor interprets HTML tags as-is, and initial postings can't be edited :/

  • <REDACTED> created ticket #199

    Incorrect handling of <br> tags next to line breaks

  • Roland Heymanns Roland Heymanns created ticket #198

    iconv() detected an illegal character in input string

  • LogMANOriginal LogMANOriginal modified ticket #50

    PHP 7 .x compatibility

  • LogMANOriginal LogMANOriginal modified ticket #60

    parsing stops after first multibyte character

  • LogMANOriginal LogMANOriginal modified ticket #64

    Comments on MAX_FILE_SIZE

  • LogMANOriginal LogMANOriginal modified ticket #65

    Notify when zero elements were found

  • LogMANOriginal LogMANOriginal modified ticket #66

    Role attribute

  • LogMANOriginal LogMANOriginal modified ticket #59

    Slashdot example updated

  • LogMANOriginal LogMANOriginal posted a comment on ticket #59

    Thanks for the feedback! The example in 1.9 is probably not functional anymore, but there is an updated version in the current master that still works. Here is the link for future reference: https://sourceforge.net/p/simplehtmldom/repository/ci/master/tree/example/scraping/example_scraping_slashdot.php

  • LogMANOriginal LogMANOriginal posted a comment on ticket #58

    That choice is entirely up to you.

  • LogMANOriginal LogMANOriginal modified ticket #58

    Is this project active anymore?

  • LogMANOriginal LogMANOriginal modified ticket #56

    How to avoid break on 404 errors?

  • LogMANOriginal LogMANOriginal posted a comment on ticket #56

    Good to know you found a solution :)

  • LogMANOriginal LogMANOriginal modified ticket #54

    Traversing the Dom within a series of columns

  • LogMANOriginal LogMANOriginal posted a comment on ticket #54

    You probably figured it out in the mean time, but here is a complete example that will give you what you want. <?php include_once 'simple_html_dom.php'; $doc = <<<EOD <tr> <td></td> <td id="column2" class="style3">A</td> <td id="column2" class="style2">B</td> <td> <a href="#link")>Description of Link</a> </td> </tr> EOD; $html = str_get_html($doc); $href = $html->find('a', 0)->href; $description = $html->find('a', 0)->innertext; echo $href . PHP_EOL . $description . PHP_EOL; // #link // Description...

  • LogMANOriginal LogMANOriginal posted a comment on ticket #53

    This is probably no longer relevant but the for loop in your example indexes over the value of the first element instead of all script elements. foreach($items->find('script',0) as $e) { $e->outertext = ''; echo '$e: ' . $e . '<br/>'; } Notice the ,0 in ->find('script',0). This is why the error occurs. Here is the correct version: foreach($items->find('script') as $e) { $e->outertext = ''; echo '$e: ' . $e . '<br/>'; }

  • LogMANOriginal LogMANOriginal modified ticket #53

    Removing tags does not work

  • LogMANOriginal LogMANOriginal modified ticket #52

    Timezone change

1 >
MongoDB Logo MongoDB