logmanoriginal Activity

Activity for LogMANOriginal

2 years ago
LogMANOriginal modified ticket #200

"Creation of dynamic property" warning in PHP 8.2 (version 1.9.1)
2 years ago
LogMANOriginal posted a comment on ticket #200

Thanks for your bug report. This is actually a typo. The variable should be called $optional_closing_tags. There is a recent commit in master that illustrates the fix. This should also work in PHP 8.2 and higher. [8dc21bcb714c4edcb4318bdc3f198f4f78762381]
2 years ago
LogMANOriginal modified ticket #199

Incorrect handling of tags next to line breaks
2 years ago
LogMANOriginal posted a comment on ticket #199

Looks good now! However, you must set the Unicode flag, or else preg_replace() may return an invalid Unicode string, which may cause the second preg_replace() to return NULL, and a deprecation error for the third preg_replace(). Good catch. Fixed via [b8d048e46b7f1964c28ea041d39ccb1d05f9a0ed]. And about the manual: I see now that the navigation sidebar is aligned far down upon page load, so that only the documentation for the functions (isset etc.) is immediately visible, not the more useful "Quick...
2 years ago
LogMANOriginal committed [b8d048]

HtmlNode: Replace and collapse unicode whitespace in plaintext
2 years ago
LogMANOriginal posted a comment on ticket #199

PS! Would be nice if you could link to the manual from the "Support" section, because it was hard to find. https://sourceforge.net/projects/simplehtmldom/support Turns out that page is managed by SF. There is no way to change the contents of that page 😔 I added a "Manual" tab instead.
2 years ago
LogMANOriginal posted a comment on ticket #199

PS! Would be nice if you could link to the manual from the "Support" section, because it was hard to find. https://sourceforge.net/projects/simplehtmldom/support Good idea! I'll do that. The space thing works now, but the BR tag is still not handled well. Try the code in the original post and compare the output when (un)-commenting the commented line. I'm comparing the output of plaintext with what is displayed in the browser and it looks exactly the same. Please note that I have removed wordwrap()...
2 years ago
LogMANOriginal posted a comment on ticket #199

[67c0f4e21091a9cc66151610a653724a0acb1b69] fixes the whitespace issue. Let me know if this works for you.
2 years ago
LogMANOriginal committed [67c0f4]

HtmlNode: Replace and collapse whitespace in plaintext
2 years ago
LogMANOriginal posted a comment on ticket #199

Shouldn't plaintext convert newlines to spaces? Did you change this recently? Surely this is a bug/regression, or am I missing something completely? The plaintext implementation is completely rewritten but it passes all tests. Your particular case probably isn't covered by any of the tests right now. I'll check this as well. At the very least seems to work right. PS! Where on the SourceForge page is the link to the manual (the one with the clickable tabs with examples, etc.)? I hope you didn't...
2 years ago
LogMANOriginal posted a comment on ticket #199

Please try again with current master. From what I can tell, the output looks right: ***** ** ********,*** ****** ********. ******* **** *** ** ** ******* *** *** ****** ***. *** **** *** ****** ****. *.***. ** **** *** ***** **** ***** ********* *** ************ ** *** *** *** *** ** ****. *** ** ****, ******** ******* *** ******** ********** * ** ******* ****. ******** ** *** *** **** ***********, ** *** *** *** * *** *** ********* .*** ***** ******* *** ** **** ***.*** ****** *** ** **** *** ********...
2 years ago
LogMANOriginal modified ticket #198

iconv() detected an illegal character in input string
2 years ago
LogMANOriginal posted a comment on ticket #198

This is fixed via [c53a612e6fe61d5b1efc0c3270e20aa34e4e84ee]. Instead of using //IGNORE, it needs to be wrapped inside a try-catch block, so that the character set is detected properly. Eventually, this will be replaced by a better solution, but this works for now. Thanks again for reporting!
2 years ago
LogMANOriginal committed [025297]

HtmlDocument: Let the parser decode entities
2 years ago
LogMANOriginal committed [d553de]

HtmlNode: Fix empty if-statement
2 years ago
LogMANOriginal committed [cc1063]

HtmlDocument: Inline token_equal, _slash, and _attr
2 years ago
LogMANOriginal committed [c53a61]

HtmlDocument: Use try-catch block for iconv
2 years ago
LogMANOriginal committed [7a5b98]

docs: Include recent changes
2 years ago
LogMANOriginal committed [ad6686]

HtmlDocument: Don't use magic functions
2 years ago
LogMANOriginal committed [8dc21b]

HtmlDocument: Fix broken $forceTagsClosed = false
2 years ago
LogMANOriginal committed [2b4971]

HtmlDocument: Use native functions for tag names and attribute values
2 years ago
LogMANOriginal committed [133547]

HtmlNode: Stop removing UTF-8 BOM from the end of a string
2 years ago
LogMANOriginal committed [718b90]

HtmlDocument: Add shortcuts for the parser
2 years ago
LogMANOriginal committed [34743a]

HtmlDocument: Inline skip method
2 years ago
LogMANOriginal committed [f658bc]

HtmlDocument: Use shortcuts for seek methods
2 years ago
LogMANOriginal committed [101a85]

Fix memory parsing test
2 years ago
LogMANOriginal committed [88c67b]

HtmlDocument: Don't assign nodes by reference
2 years ago
LogMANOriginal committed [d573cd]

HtmlDocument: Don't remove noise before parsing.
2 years ago
LogMANOriginal modified ticket #199

Incorrect handling of tags next to line breaks
2 years ago
LogMANOriginal posted a comment on ticket #199

Thanks for reporting. I fixed your original message. You are right, the current implementation of is wrong. I haven't tested this yet but it should give slightly better results if you define DEFAULT_BR_TEXT like this: define("DEFAULT_BR_TEXT", PHP_EOL) At the very least, this makes it platform independent. That said, there is additional work to do in the parser to handle all cases (like the a case).
2 years ago
LogMANOriginal modified ticket #198

iconv() detected an illegal character in input string
2 years ago
LogMANOriginal posted a comment on ticket #198

Thanks for reporting. It took me a while to figure out what is going on. Am I right to assume that you are running on PHP 8.x? In previous versions that error would not have been reported because of the error suppression operator (@). (Un-)fortunately the behavior of this operator changed in PHP 8: https://php.watch/versions/8.0/fatal-error-suppression The behavior for //IGNORE depends on the specific implementation of iconv, some of which completely ignore this flag. Still, this is a good hack to...
2 years ago
LogMANOriginal modified ticket #50

PHP 7 .x compatibility
2 years ago
LogMANOriginal modified ticket #60

parsing stops after first multibyte character
2 years ago
LogMANOriginal modified ticket #64

Comments on MAX_FILE_SIZE
2 years ago
LogMANOriginal modified ticket #65

Notify when zero elements were found
2 years ago
LogMANOriginal modified ticket #66

Role attribute
2 years ago
LogMANOriginal modified ticket #59

Slashdot example updated
2 years ago
LogMANOriginal posted a comment on ticket #59

Thanks for the feedback! The example in 1.9 is probably not functional anymore, but there is an updated version in the current master that still works. Here is the link for future reference: https://sourceforge.net/p/simplehtmldom/repository/ci/master/tree/example/scraping/example_scraping_slashdot.php
2 years ago
LogMANOriginal modified ticket #58

Is this project active anymore?
2 years ago
LogMANOriginal posted a comment on ticket #58

That choice is entirely up to you.
2 years ago
LogMANOriginal modified ticket #56

How to avoid break on 404 errors?
2 years ago
LogMANOriginal posted a comment on ticket #56

Good to know you found a solution :)
2 years ago
LogMANOriginal modified ticket #54

Traversing the Dom within a series of columns
2 years ago
LogMANOriginal posted a comment on ticket #54

You probably figured it out in the mean time, but here is a complete example that will give you what you want. <?php include_once 'simple_html_dom.php'; $doc = <<<EOD <tr> <td></td> <td id="column2" class="style3">A</td> <td id="column2" class="style2">B</td> <td> <a href="#link")>Description of Link</a> </td> </tr> EOD; $html = str_get_html($doc); $href = $html->find('a', 0)->href; $description = $html->find('a', 0)->innertext; echo $href . PHP_EOL . $description . PHP_EOL; // #link // Description...
2 years ago
LogMANOriginal modified ticket #53

Removing tags does not work
2 years ago
LogMANOriginal posted a comment on ticket #53

This is probably no longer relevant but the for loop in your example indexes over the value of the first element instead of all script elements. foreach($items->find('script',0) as $e) { $e->outertext = ''; echo '$e: ' . $e . ' '; } Notice the ,0 in ->find('script',0). This is why the error occurs. Here is the correct version: foreach($items->find('script') as $e) { $e->outertext = ''; echo '$e: ' . $e . ' '; }
2 years ago
LogMANOriginal modified ticket #52

Timezone change
2 years ago
LogMANOriginal modified ticket #45

Uncaught Error: Call to a member function find() on string in ... Stack trace: #0 {main} thrown in
2 years ago
LogMANOriginal modified ticket #35

HTTP Request failed
2 years ago
LogMANOriginal modified ticket #43

raspado a un script de json
2 years ago
LogMANOriginal modified ticket #163

Missing whitespace in plaintext property
2 years ago
LogMANOriginal posted a comment on ticket #163

966c5e39493eff7dc1eb77e0004bdc0015037b34 fixes various issues related to spaces and line breaks when generating plain text. For all the examples provided here, it produces the correct output. It also properly collapses superfluous spaces and line breaks, so that the output should be much more readable, especially for awkwardly formatted HTML documents.
2 years ago
LogMANOriginal committed [966c5e]

HtmlNode: Improve plain text output of text()
2 years ago
LogMANOriginal modified a comment on ticket #148

I completely misunderstood the original report and finally figured it out. Thanks for all your feedback! This issue is resolved via [d6dcf50d6b03eb1d0c575abb7011abb658fefcf1] [4ad20901f0e63356cb3eb15a1cf4d9bf3a9837cc] by comparing the string length with PHP_MAXPATHLEN before calling is_file(). Edit: Had to fix the fix because the original fix was broken :)
2 years ago
LogMANOriginal committed [4ad209]

HtmlDocument: Fixing the fix :)
2 years ago
LogMANOriginal posted a comment on ticket #148

I completely misunderstood the original report and finally figured it out. Thanks for all your feedback! This issue is resolved via [d6dcf50d6b03eb1d0c575abb7011abb658fefcf1] by comparing the string length with PHP_MAXPATHLEN before calling is_file().
2 years ago
LogMANOriginal modified ticket #148

is_file(): File name is longer than the maximum allowed path length on this platform (4096)
2 years ago
LogMANOriginal committed [d6dcf5]

HtmlDocument: Check PHP_MAXPATHLEN before is_file()
2 years ago
LogMANOriginal committed [8a9a59]

Fix large file parsing test
2 years ago
LogMANOriginal committed [a17ec8]

Fix spelling mistakes.
2 years ago
LogMANOriginal committed [157ca6]

HtmlNode: Optimize control flow for seek().
2 years ago
LogMANOriginal committed [c6a811]

HtmlNode: Simplify charset checks before calling iconv.
2 years ago
LogMANOriginal committed [d0b9fd]

HtmlNode: Breakup complex if-else-statement into more readable chunks.
2 years ago
LogMANOriginal committed [7244df]

Remove unnecessary curly braces syntax.
2 years ago
LogMANOriginal committed [183127]

Cleanup duplicate branches in switch statements.
2 years ago
LogMANOriginal committed [bbfca8]

HtmlNode: Verify that constructor argument is instance of HtmlDocument
2 years ago
LogMANOriginal committed [ee9039]

HtmlDocument: Simplify regex expressions
2 years ago
LogMANOriginal committed [316d6c]

docs: Fix broken page links
2 years ago
LogMANOriginal committed [d5ead3]

phpunit: Remove unnecessary default value assignment
2 years ago
LogMANOriginal committed [c2e1e7]

examples: Initialize $data variable
2 years ago
LogMANOriginal committed [890186]

HtmlNode: Reduce complexity of CSS selector regex
2 years ago
LogMANOriginal committed [bcb145]

docs: Fix table formatting in markdown files
2 years ago
LogMANOriginal committed [3ba7e3]

docs: Move page titles to mkdocs.yml
2 years ago
LogMANOriginal committed [c858a3]

docs: Update Google Analytics to G4 and display prev/next buttons
2 years ago
LogMANOriginal committed [1b013d]

Reorganize docs
2 years ago
LogMANOriginal committed [7cbc63]

HtmlNode: Use only HtmlElement to determine block-level elements
2 years ago
LogMANOriginal committed [a862f0]

Use HtmlElement::isRawTextElement() in HtmlDocument and HtmlNode
2 years ago
LogMANOriginal committed [58aad2]

Add new class to handle HTML elements
2 years ago
LogMANOriginal modified ticket #197

Fixed character translation error in iconv()
2 years ago
LogMANOriginal posted a comment on ticket #197

I can see how this is annoying. Unfortunately, UTF-8//IGNORE silently discards characters that cannot be represented in the target charset, which may result in incorrect output. As you already know, UTF-8//TRANSLIT also doesn't always work and heavily depends on the actual implementation of iconv and system settings (some of which completely ignore //TRANSLIT). This unreliability of iconv is why it is better to have a notice reported here and leave the choice to the caller. You can actually override...
2 years ago
LogMANOriginal posted a comment on ticket #196

Pseudo-classes are currently not supported. Refer to the documentation for the find method for a list of supported selectors. In this case, I suggest using the lastChild method, which will give you the same result as :last-child.
2 years ago
LogMANOriginal modified ticket #196

:last-child selector doesn't work
2 years ago
LogMANOriginal modified ticket #194

Wrong variable name at str_get_html
2 years ago
LogMANOriginal modified ticket #193

Patch for PHP 8
2 years ago
LogMANOriginal modified ticket #193

Patch for PHP 8
2 years ago
LogMANOriginal posted a comment on ticket #193

Great, glad to hear it works for you :) The reason for adding the condition over changing the default values is to make sure it works even when a caller passes null as an argument.
2 years ago
LogMANOriginal committed [de6e37]

.github: Add PHP compatibility check to workflow
2 years ago
LogMANOriginal committed [1f9213]

phpcompatibility: Update and clarify compatibility standards
2 years ago
LogMANOriginal committed [b3ce6b]

composer: Downgrade phpcs to version 2.x
2 years ago
LogMANOriginal posted a comment on ticket #193

I see, that makes sense. Thanks for clarifying. This is actually a deprecation warning and not an error. It occurs when calling trim(null) (in the case that $str = null) because trim() expects a non-nullable string. This warning was added in PHP 8.1: https://www.php.net/releases/8.1/en.php#deprecations_and_bc_breaks Passing null to non-nullable internal function parameters is deprecated. There are likely other places that are affected by this. This particular case, however, is fixed in [1765ac4494a05d5c84408398127e6539f6bc1238]....
2 years ago
LogMANOriginal committed [1765ac]

HtmlDocument: Don't pass null to trim()
2 years ago
LogMANOriginal modified ticket #195

Possibly XSS vulnerability
2 years ago
LogMANOriginal posted a comment on ticket #195

Thanks for reporting this issue. While I agree that this is a bug in the attribute handler, it is not a XSS vulnerability, at least not for this project. This issue is fixed in [a706de9bcb3b74ad10e04cc0b2de0d1b35007ab4]
2 years ago
LogMANOriginal committed [a706de]

HtmlNode: Add quotes to unquoted attribute value depending on content
2 years ago
LogMANOriginal committed [981b97]

README: Replace Travis-CI bage by GitHub Workflow badge
2 years ago
LogMANOriginal modified ticket #194

Wrong variable name at str_get_html
2 years ago
LogMANOriginal posted a comment on ticket #194

The parameter name looks correct to me $str. What version of the library are you using? function str_get_html( $str, $lowercase = true, $forceTagsClosed = true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN = true, $defaultBRText = DEFAULT_BR_TEXT, $defaultSpanText = DEFAULT_SPAN_TEXT) { $dom = new simple_html_dom( null, $lowercase, $forceTagsClosed, $target_charset, $stripRN, $defaultBRText, $defaultSpanText ); if (empty($str) || strlen($str) > MAX_FILE_SIZE) { $dom->clear(); return false;...

1 >

LogMANOriginal Activity

Activity for LogMANOriginal

LogMANOriginal modified ticket #200

LogMANOriginal posted a comment on ticket #200

LogMANOriginal modified ticket #199

LogMANOriginal posted a comment on ticket #199

LogMANOriginal committed [b8d048]

LogMANOriginal posted a comment on ticket #199

LogMANOriginal posted a comment on ticket #199

LogMANOriginal posted a comment on ticket #199

LogMANOriginal committed [67c0f4]

LogMANOriginal posted a comment on ticket #199

LogMANOriginal posted a comment on ticket #199

LogMANOriginal modified ticket #198

LogMANOriginal posted a comment on ticket #198

LogMANOriginal committed [025297]

LogMANOriginal committed [d553de]

LogMANOriginal committed [cc1063]

LogMANOriginal committed [c53a61]

LogMANOriginal committed [7a5b98]

LogMANOriginal committed [ad6686]

LogMANOriginal committed [8dc21b]

LogMANOriginal committed [2b4971]

LogMANOriginal committed [133547]

LogMANOriginal committed [718b90]

LogMANOriginal committed [34743a]

LogMANOriginal committed [f658bc]

LogMANOriginal committed [101a85]

LogMANOriginal committed [88c67b]

LogMANOriginal committed [d573cd]

LogMANOriginal modified ticket #199

LogMANOriginal posted a comment on ticket #199

LogMANOriginal modified ticket #198

LogMANOriginal posted a comment on ticket #198

LogMANOriginal modified ticket #50

LogMANOriginal modified ticket #60

LogMANOriginal modified ticket #64

LogMANOriginal modified ticket #65

LogMANOriginal modified ticket #66

LogMANOriginal modified ticket #59

LogMANOriginal posted a comment on ticket #59

LogMANOriginal modified ticket #58

LogMANOriginal posted a comment on ticket #58

LogMANOriginal modified ticket #56

LogMANOriginal posted a comment on ticket #56

LogMANOriginal modified ticket #54

LogMANOriginal posted a comment on ticket #54

LogMANOriginal modified ticket #53

LogMANOriginal posted a comment on ticket #53

LogMANOriginal modified ticket #52

LogMANOriginal modified ticket #45

LogMANOriginal modified ticket #35

LogMANOriginal modified ticket #43

LogMANOriginal modified ticket #163

LogMANOriginal posted a comment on ticket #163

LogMANOriginal committed [966c5e]

LogMANOriginal modified a comment on ticket #148

LogMANOriginal committed [4ad209]

LogMANOriginal posted a comment on ticket #148

LogMANOriginal modified ticket #148

LogMANOriginal committed [d6dcf5]

LogMANOriginal committed [8a9a59]

LogMANOriginal committed [a17ec8]

LogMANOriginal committed [157ca6]

LogMANOriginal committed [c6a811]

LogMANOriginal committed [d0b9fd]

LogMANOriginal committed [7244df]

LogMANOriginal committed [183127]

LogMANOriginal committed [bbfca8]

LogMANOriginal committed [ee9039]

LogMANOriginal committed [316d6c]

LogMANOriginal committed [d5ead3]

LogMANOriginal committed [c2e1e7]

LogMANOriginal committed [890186]

LogMANOriginal committed [bcb145]

LogMANOriginal committed [3ba7e3]

LogMANOriginal committed [c858a3]

LogMANOriginal committed [1b013d]

LogMANOriginal committed [7cbc63]

LogMANOriginal committed [a862f0]