Activity for LogMANOriginal

  • LogMANOriginal LogMANOriginal modified ticket #200

    "Creation of dynamic property" warning in PHP 8.2 (version 1.9.1)

  • LogMANOriginal LogMANOriginal posted a comment on ticket #200

    Thanks for your bug report. This is actually a typo. The variable should be called $optional_closing_tags. There is a recent commit in master that illustrates the fix. This should also work in PHP 8.2 and higher. [8dc21bcb714c4edcb4318bdc3f198f4f78762381]

  • LogMANOriginal LogMANOriginal modified ticket #199

    Incorrect handling of <br> tags next to line breaks

  • LogMANOriginal LogMANOriginal posted a comment on ticket #199

    Looks good now! However, you must set the Unicode flag, or else preg_replace() may return an invalid Unicode string, which may cause the second preg_replace() to return NULL, and a deprecation error for the third preg_replace(). Good catch. Fixed via [b8d048e46b7f1964c28ea041d39ccb1d05f9a0ed]. And about the manual: I see now that the navigation sidebar is aligned far down upon page load, so that only the documentation for the functions (isset etc.) is immediately visible, not the more useful "Quick...

  • LogMANOriginal LogMANOriginal committed [b8d048]

    HtmlNode: Replace and collapse unicode whitespace in plaintext

  • LogMANOriginal LogMANOriginal posted a comment on ticket #199

    PS! Would be nice if you could link to the manual from the "Support" section, because it was hard to find. https://sourceforge.net/projects/simplehtmldom/support Turns out that page is managed by SF. There is no way to change the contents of that page 😔 I added a "Manual" tab instead.

  • LogMANOriginal LogMANOriginal posted a comment on ticket #199

    PS! Would be nice if you could link to the manual from the "Support" section, because it was hard to find. https://sourceforge.net/projects/simplehtmldom/support Good idea! I'll do that. The space thing works now, but the BR tag is still not handled well. Try the code in the original post and compare the output when (un)-commenting the commented line. I'm comparing the output of plaintext with what is displayed in the browser and it looks exactly the same. Please note that I have removed wordwrap()...

  • LogMANOriginal LogMANOriginal posted a comment on ticket #199

    [67c0f4e21091a9cc66151610a653724a0acb1b69] fixes the whitespace issue. Let me know if this works for you.

  • LogMANOriginal LogMANOriginal committed [67c0f4]

    HtmlNode: Replace and collapse whitespace in plaintext

  • LogMANOriginal LogMANOriginal posted a comment on ticket #199

    Shouldn't plaintext convert newlines to spaces? Did you change this recently? Surely this is a bug/regression, or am I missing something completely? The plaintext implementation is completely rewritten but it passes all tests. Your particular case probably isn't covered by any of the tests right now. I'll check this as well. At the very least <br> seems to work right. PS! Where on the SourceForge page is the link to the manual (the one with the clickable tabs with examples, etc.)? I hope you didn't...

  • LogMANOriginal LogMANOriginal posted a comment on ticket #199

    Please try again with current master. From what I can tell, the output looks right: ***** ** ********,*** ****** ********. ******* **** *** ** ** ******* *** *** ****** ***. *** **** *** ****** ****. *.***. ** **** *** ***** **** ***** ********* *** ************ ** *** *** *** *** ** ****. *** ** ****, ******** ******* *** ******** ********** * ** ******* ****. ******** ** *** *** **** ***********, ** *** *** *** * *** *** ********* .*** ***** ******* *** ** **** ***.*** ****** *** ** **** *** ********...

  • LogMANOriginal LogMANOriginal modified ticket #198

    iconv() detected an illegal character in input string

  • LogMANOriginal LogMANOriginal posted a comment on ticket #198

    This is fixed via [c53a612e6fe61d5b1efc0c3270e20aa34e4e84ee]. Instead of using //IGNORE, it needs to be wrapped inside a try-catch block, so that the character set is detected properly. Eventually, this will be replaced by a better solution, but this works for now. Thanks again for reporting!

  • LogMANOriginal LogMANOriginal committed [025297]

    HtmlDocument: Let the parser decode entities

  • LogMANOriginal LogMANOriginal committed [d553de]

    HtmlNode: Fix empty if-statement

  • LogMANOriginal LogMANOriginal committed [cc1063]

    HtmlDocument: Inline token_equal, _slash, and _attr

  • LogMANOriginal LogMANOriginal committed [c53a61]

    HtmlDocument: Use try-catch block for iconv

  • LogMANOriginal LogMANOriginal committed [7a5b98]

    docs: Include recent changes

  • LogMANOriginal LogMANOriginal committed [ad6686]

    HtmlDocument: Don't use magic functions

  • LogMANOriginal LogMANOriginal committed [8dc21b]

    HtmlDocument: Fix broken $forceTagsClosed = false

  • LogMANOriginal LogMANOriginal committed [2b4971]

    HtmlDocument: Use native functions for tag names and attribute values

  • LogMANOriginal LogMANOriginal committed [133547]

    HtmlNode: Stop removing UTF-8 BOM from the end of a string

  • LogMANOriginal LogMANOriginal committed [718b90]

    HtmlDocument: Add shortcuts for the parser

  • LogMANOriginal LogMANOriginal committed [34743a]

    HtmlDocument: Inline skip method

  • LogMANOriginal LogMANOriginal committed [f658bc]

    HtmlDocument: Use shortcuts for seek methods

  • LogMANOriginal LogMANOriginal committed [101a85]

    Fix memory parsing test

  • LogMANOriginal LogMANOriginal committed [88c67b]

    HtmlDocument: Don't assign nodes by reference

  • LogMANOriginal LogMANOriginal committed [d573cd]

    HtmlDocument: Don't remove noise before parsing.

  • LogMANOriginal LogMANOriginal modified ticket #199

    Incorrect handling of <br> tags next to line breaks

  • LogMANOriginal LogMANOriginal posted a comment on ticket #199

    Thanks for reporting. I fixed your original message. You are right, the current implementation of <br> is wrong. I haven't tested this yet but it should give slightly better results if you define DEFAULT_BR_TEXT like this: define("DEFAULT_BR_TEXT", PHP_EOL) At the very least, this makes it platform independent. That said, there is additional work to do in the parser to handle all cases (like the <br> a case).

  • LogMANOriginal LogMANOriginal modified ticket #198

    iconv() detected an illegal character in input string

  • LogMANOriginal LogMANOriginal posted a comment on ticket #198

    Thanks for reporting. It took me a while to figure out what is going on. Am I right to assume that you are running on PHP 8.x? In previous versions that error would not have been reported because of the error suppression operator (@). (Un-)fortunately the behavior of this operator changed in PHP 8: https://php.watch/versions/8.0/fatal-error-suppression The behavior for //IGNORE depends on the specific implementation of iconv, some of which completely ignore this flag. Still, this is a good hack to...

  • LogMANOriginal LogMANOriginal modified ticket #50

    PHP 7 .x compatibility

  • LogMANOriginal LogMANOriginal modified ticket #60

    parsing stops after first multibyte character

  • LogMANOriginal LogMANOriginal modified ticket #64

    Comments on MAX_FILE_SIZE

  • LogMANOriginal LogMANOriginal modified ticket #65

    Notify when zero elements were found

  • LogMANOriginal LogMANOriginal modified ticket #66

    Role attribute

  • LogMANOriginal LogMANOriginal modified ticket #59

    Slashdot example updated

  • LogMANOriginal LogMANOriginal posted a comment on ticket #59

    Thanks for the feedback! The example in 1.9 is probably not functional anymore, but there is an updated version in the current master that still works. Here is the link for future reference: https://sourceforge.net/p/simplehtmldom/repository/ci/master/tree/example/scraping/example_scraping_slashdot.php

  • LogMANOriginal LogMANOriginal modified ticket #58

    Is this project active anymore?

  • LogMANOriginal LogMANOriginal posted a comment on ticket #58

    That choice is entirely up to you.

  • LogMANOriginal LogMANOriginal modified ticket #56

    How to avoid break on 404 errors?

  • LogMANOriginal LogMANOriginal posted a comment on ticket #56

    Good to know you found a solution :)

  • LogMANOriginal LogMANOriginal modified ticket #54

    Traversing the Dom within a series of columns

  • LogMANOriginal LogMANOriginal posted a comment on ticket #54

    You probably figured it out in the mean time, but here is a complete example that will give you what you want. <?php include_once 'simple_html_dom.php'; $doc = <<<EOD <tr> <td></td> <td id="column2" class="style3">A</td> <td id="column2" class="style2">B</td> <td> <a href="#link")>Description of Link</a> </td> </tr> EOD; $html = str_get_html($doc); $href = $html->find('a', 0)->href; $description = $html->find('a', 0)->innertext; echo $href . PHP_EOL . $description . PHP_EOL; // #link // Description...

  • LogMANOriginal LogMANOriginal modified ticket #53

    Removing tags does not work

  • LogMANOriginal LogMANOriginal posted a comment on ticket #53

    This is probably no longer relevant but the for loop in your example indexes over the value of the first element instead of all script elements. foreach($items->find('script',0) as $e) { $e->outertext = ''; echo '$e: ' . $e . '<br/>'; } Notice the ,0 in ->find('script',0). This is why the error occurs. Here is the correct version: foreach($items->find('script') as $e) { $e->outertext = ''; echo '$e: ' . $e . '<br/>'; }

  • LogMANOriginal LogMANOriginal modified ticket #52

    Timezone change

  • LogMANOriginal LogMANOriginal modified ticket #45

    Uncaught Error: Call to a member function find() on string in ... Stack trace: #0 {main} thrown in

  • LogMANOriginal LogMANOriginal modified ticket #35

    HTTP Request failed

  • LogMANOriginal LogMANOriginal modified ticket #43

    raspado a un script de json

  • LogMANOriginal LogMANOriginal modified ticket #163

    Missing whitespace in plaintext property

  • LogMANOriginal LogMANOriginal posted a comment on ticket #163

    966c5e39493eff7dc1eb77e0004bdc0015037b34 fixes various issues related to spaces and line breaks when generating plain text. For all the examples provided here, it produces the correct output. It also properly collapses superfluous spaces and line breaks, so that the output should be much more readable, especially for awkwardly formatted HTML documents.

  • LogMANOriginal LogMANOriginal committed [966c5e]

    HtmlNode: Improve plain text output of text()

  • LogMANOriginal LogMANOriginal modified a comment on ticket #148

    I completely misunderstood the original report and finally figured it out. Thanks for all your feedback! This issue is resolved via [d6dcf50d6b03eb1d0c575abb7011abb658fefcf1] [4ad20901f0e63356cb3eb15a1cf4d9bf3a9837cc] by comparing the string length with PHP_MAXPATHLEN before calling is_file(). Edit: Had to fix the fix because the original fix was broken :)

  • LogMANOriginal LogMANOriginal committed [4ad209]

    HtmlDocument: Fixing the fix :)

  • LogMANOriginal LogMANOriginal posted a comment on ticket #148

    I completely misunderstood the original report and finally figured it out. Thanks for all your feedback! This issue is resolved via [d6dcf50d6b03eb1d0c575abb7011abb658fefcf1] by comparing the string length with PHP_MAXPATHLEN before calling is_file().

  • LogMANOriginal LogMANOriginal modified ticket #148

    is_file(): File name is longer than the maximum allowed path length on this platform (4096)

  • LogMANOriginal LogMANOriginal committed [d6dcf5]

    HtmlDocument: Check PHP_MAXPATHLEN before is_file()

  • LogMANOriginal LogMANOriginal committed [8a9a59]

    Fix large file parsing test

  • LogMANOriginal LogMANOriginal committed [a17ec8]

    Fix spelling mistakes.

  • LogMANOriginal LogMANOriginal committed [157ca6]

    HtmlNode: Optimize control flow for seek().

  • LogMANOriginal LogMANOriginal committed [c6a811]

    HtmlNode: Simplify charset checks before calling iconv.

  • LogMANOriginal LogMANOriginal committed [d0b9fd]

    HtmlNode: Breakup complex if-else-statement into more readable chunks.

  • LogMANOriginal LogMANOriginal committed [7244df]

    Remove unnecessary curly braces syntax.

  • LogMANOriginal LogMANOriginal committed [183127]

    Cleanup duplicate branches in switch statements.

  • LogMANOriginal LogMANOriginal committed [bbfca8]

    HtmlNode: Verify that constructor argument is instance of HtmlDocument

  • LogMANOriginal LogMANOriginal committed [ee9039]

    HtmlDocument: Simplify regex expressions

  • LogMANOriginal LogMANOriginal committed [316d6c]

    docs: Fix broken page links

  • LogMANOriginal LogMANOriginal committed [d5ead3]

    phpunit: Remove unnecessary default value assignment

  • LogMANOriginal LogMANOriginal committed [c2e1e7]

    examples: Initialize $data variable

  • LogMANOriginal LogMANOriginal committed [890186]

    HtmlNode: Reduce complexity of CSS selector regex

  • LogMANOriginal LogMANOriginal committed [bcb145]

    docs: Fix table formatting in markdown files

  • LogMANOriginal LogMANOriginal committed [3ba7e3]

    docs: Move page titles to mkdocs.yml

  • LogMANOriginal LogMANOriginal committed [c858a3]

    docs: Update Google Analytics to G4 and display prev/next buttons

  • LogMANOriginal LogMANOriginal committed [1b013d]

    Reorganize docs

  • LogMANOriginal LogMANOriginal committed [7cbc63]

    HtmlNode: Use only HtmlElement to determine block-level elements

  • LogMANOriginal LogMANOriginal committed [a862f0]

    Use HtmlElement::isRawTextElement() in HtmlDocument and HtmlNode

  • LogMANOriginal LogMANOriginal committed [58aad2]

    Add new class to handle HTML elements

  • LogMANOriginal LogMANOriginal modified ticket #197

    Fixed character translation error in iconv()

  • LogMANOriginal LogMANOriginal posted a comment on ticket #197

    I can see how this is annoying. Unfortunately, UTF-8//IGNORE silently discards characters that cannot be represented in the target charset, which may result in incorrect output. As you already know, UTF-8//TRANSLIT also doesn't always work and heavily depends on the actual implementation of iconv and system settings (some of which completely ignore //TRANSLIT). This unreliability of iconv is why it is better to have a notice reported here and leave the choice to the caller. You can actually override...

  • LogMANOriginal LogMANOriginal posted a comment on ticket #196

    Pseudo-classes are currently not supported. Refer to the documentation for the find method for a list of supported selectors. In this case, I suggest using the lastChild method, which will give you the same result as :last-child.

  • LogMANOriginal LogMANOriginal modified ticket #196

    :last-child selector doesn't work

  • LogMANOriginal LogMANOriginal modified ticket #194

    Wrong variable name at str_get_html

  • LogMANOriginal LogMANOriginal modified ticket #193

    Patch for PHP 8

  • LogMANOriginal LogMANOriginal modified ticket #193

    Patch for PHP 8

  • LogMANOriginal LogMANOriginal posted a comment on ticket #193

    Great, glad to hear it works for you :) The reason for adding the condition over changing the default values is to make sure it works even when a caller passes null as an argument.

  • LogMANOriginal LogMANOriginal committed [de6e37]

    .github: Add PHP compatibility check to workflow

  • LogMANOriginal LogMANOriginal committed [1f9213]

    phpcompatibility: Update and clarify compatibility standards

  • LogMANOriginal LogMANOriginal committed [b3ce6b]

    composer: Downgrade phpcs to version 2.x

  • LogMANOriginal LogMANOriginal posted a comment on ticket #193

    I see, that makes sense. Thanks for clarifying. This is actually a deprecation warning and not an error. It occurs when calling trim(null) (in the case that $str = null) because trim() expects a non-nullable string. This warning was added in PHP 8.1: https://www.php.net/releases/8.1/en.php#deprecations_and_bc_breaks Passing null to non-nullable internal function parameters is deprecated. There are likely other places that are affected by this. This particular case, however, is fixed in [1765ac4494a05d5c84408398127e6539f6bc1238]....

  • LogMANOriginal LogMANOriginal committed [1765ac]

    HtmlDocument: Don't pass null to trim()

  • LogMANOriginal LogMANOriginal modified ticket #195

    Possibly XSS vulnerability

  • LogMANOriginal LogMANOriginal posted a comment on ticket #195

    Thanks for reporting this issue. While I agree that this is a bug in the attribute handler, it is not a XSS vulnerability, at least not for this project. This issue is fixed in [a706de9bcb3b74ad10e04cc0b2de0d1b35007ab4]

  • LogMANOriginal LogMANOriginal committed [a706de]

    HtmlNode: Add quotes to unquoted attribute value depending on content

  • LogMANOriginal LogMANOriginal committed [981b97]

    README: Replace Travis-CI bage by GitHub Workflow badge

  • LogMANOriginal LogMANOriginal modified ticket #194

    Wrong variable name at str_get_html

  • LogMANOriginal LogMANOriginal posted a comment on ticket #194

    The parameter name looks correct to me $str. What version of the library are you using? function str_get_html( $str, $lowercase = true, $forceTagsClosed = true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN = true, $defaultBRText = DEFAULT_BR_TEXT, $defaultSpanText = DEFAULT_SPAN_TEXT) { $dom = new simple_html_dom( null, $lowercase, $forceTagsClosed, $target_charset, $stripRN, $defaultBRText, $defaultSpanText ); if (empty($str) || strlen($str) > MAX_FILE_SIZE) { $dom->clear(); return false;...

1 >