olalav Activity

Activity for olalav

2 years ago
olalav created ticket #67

Get only text in leaf nodes (avoid duplication)
2 years ago
olalav created ticket #205

End tags erroneously included in plaintext
2 years ago
olalav posted a comment on ticket #203

Actually it has to be done something like this, because this function can be called from inside the library, and we want to get the first call that is outside the library. PS! Is the maintainer active these days? Has been quiet for a while. diff --git a/HtmlNode.php b/HtmlNode.php index 9649d37..99dbda4 100644 --- a/HtmlNode.php +++ b/HtmlNode.php @@ -549,3 +554,12 @@ class HtmlNode { - return $this->find($selector, $idx, $lowercase) ?: null; + if(!$element = $this->find($selector, $idx, $lowercase))...
2 years ago
olalav posted a comment on ticket #204

diff --git a/HtmlNode.php b/HtmlNode.php index 9649d37..aef8b17 100644 --- a/HtmlNode.php +++ b/HtmlNode.php @@ -549,3 +549,12 @@ class HtmlNode + function first($selector, $idx = 0, $lowercase = false) + { + return $this->expect($selector, $idx, $lowercase); + } Missed semicolon and preformatting.
2 years ago
olalav created ticket #204

Convenience function for getting first element
2 years ago
olalav posted a comment on ticket #201

diff --git a/simple_html_dom.php b/simple_html_dom.php index bce4d9e..97d6e1d 100644 --- a/simple_html_dom.php +++ b/simple_html_dom.php @@ -117,3 +117,3 @@ function file_get_html( $dom->clear(); - return false; + $contents = ""; } @@ -144,5 +144,4 @@ function str_get_html( $dom->clear(); - return false; + $contents = ""; } - return $dom->load($str, $lowercase, $stripRN); Better version with tabs. PS! The inline editor and preview function on the site seems to hide the first line of content :|
2 years ago
olalav created ticket #203

Always tell user where he expected non-existing element
2 years ago
olalav created ticket #201

Never return false on documents
2 years ago
olalav modified a comment on ticket #199

Looks good now! However, you must set the Unicode flag, or else preg_replace() may return an invalid Unicode string, which may cause the second preg_replace() to return NULL, and a deprecation error for the third preg_replace(). diff --git a/HtmlNode.php b/HtmlNode.php index 9bc6a1a..9649d37 100644 --- a/HtmlNode.php +++ b/HtmlNode.php @@ -351 +351 @@ class HtmlNode - $ret = preg_replace('/\s+/', ' ', $ret); + $ret = preg_replace('/\s+/u', ' ', $ret); And about the manual: I see now that the navigation...
2 years ago
olalav posted a comment on ticket #199

Looks good now! However, you must set the Unicode flag here or else preg_replace() returns NULL for certain strings, which causes (deprecation) errors further down. diff --git a/HtmlNode.php b/HtmlNode.php index 9bc6a1a..9649d37 100644 --- a/HtmlNode.php +++ b/HtmlNode.php @@ -351 +351 @@ class HtmlNode - $ret = preg_replace('/\s+/', ' ', $ret); + $ret = preg_replace('/\s+/u', ' ', $ret); And about the manual: I see now that the navigation sidebar is aligned far down upon page load, so that only...
2 years ago
olalav posted a comment on ticket #199

The space thing works now, but the BR tag is still not handled well. Try the code in the original post and compare the output when (un)-commenting the commented line. PS! Would be nice if you could link to the manual from the "Support" section, because it was hard to find. https://sourceforge.net/projects/simplehtmldom/support
2 years ago
olalav posted a comment on ticket #199

Shouldn't plaintext convert newlines to spaces? Did you change this recently? Surely this is a bug/regression, or am I missing something completely? $text = "Hello" . "\n" . "World"; $plain = str_get_html($text)->plaintext; echo "PLAINTEXT:\n" . $plain . "\n\n"; echo "WORDWRAP:\n" . wordwrap($plain, 80) . "\n"; PS! Where on the SourceForge page is the link to the manual (the one with the clickable tabs with examples, etc.)? I hope you didn't remove this, because I use it as a reference all...
2 years ago
olalav posted a comment on ticket #199

The text should say "The BR tag is not...". Evidently this editor interprets HTML tags as-is, and initial postings can't be edited :/
2 years ago
olalav created ticket #199

Incorrect handling of tags next to line breaks
2 years ago
olalav posted a comment on ticket #193

I know the difference :) I thought it would be better to use a blank string to begin with rather than checking for null later, but you know your own code better and you probably have your reasons. Anyway, no more warnings with the new version, so I'm happy!
2 years ago
olalav modified a comment on ticket #193

Without this patch I get error message like: HtmlDocument.php(269):trim(): Passing null to parameter #1 ($string) of type string is deprecated $ php -v PHP 8.1.4 (cli) (built: Apr 4 2022 05:02:21) (NTS)
2 years ago
olalav posted a comment on ticket #193

Without this oatch I get error message like: HtmlDocument.php(269):trim(): Passing null to parameter #1 ($string) of type string is deprecated $ php -v PHP 8.1.4 (cli) (built: Apr 4 2022 05:02:21) (NTS)
2 years ago
olalav posted a comment on ticket #186

Very happy that you're still maintaining this project. It's my go-to library for parsing HTML and I use it every day. See also my small contribution #193 for compatibility with PHP 8.
3 years ago
olalav created ticket #193

Patch for PHP 8
4 years ago
olalav created ticket #186

find("ul a") finds a outside ul
4 years ago
olalav posted a comment on ticket #63

How do I do find p tags whose class contains neither foo nor bar? $html->find("p:not([class~=foo]) # excludes foo, but includes bar
5 years ago
olalav posted a comment on ticket #65

Very nice solution!
5 years ago
olalav created ticket #65

Notify when zero elements were found
5 years ago
olalav created ticket #64

Comments on MAX_FILE_SIZE
5 years ago
olalav posted a comment on ticket #63

Very nice! This is most useful. Feel free to close this issue and I'll reopen it if I see any problems!
5 years ago
olalav posted a comment on ticket #61

Feel free to close this issue and I'll reopen it if I see any problems!
5 years ago
olalav posted a comment on ticket #62

No problem :) If /u recognises \xc2\xa0 as one unit, your patch should work. Feel free to close this issue and I'll reopen it if I see any problems!
5 years ago
olalav posted a comment on ticket #46

This example code is clearly wrong. Ignore it for the moment being, and I'll updated it as necessary. If not you may close it in a week or so.
5 years ago
olalav created ticket #46

Access to array of matched elements
5 years ago
olalav created ticket #63

Match elements that don't contain a certain value
5 years ago
olalav posted a comment on ticket #62

Another example. The following code breaks if nbsp is not handled as a character sequence. $html = str_get_html("«Hello, World»"); echo $html->plaintext; The fix I'm using at the moment: diff --git a/simple_html_dom.php b/simple_html_dom.php index c909d18..8e747f3 100644 --- a/simple_html_dom.php +++ b/simple_html_dom.php @@ -502,6 +502,8 @@ class simple_html_dom_node // Reduce whitespace at start/end to a single (or none) space - $ret = preg_replace('/[ \t\n\r\0\x0B\xC2\xA0]+$/', ' ',...
5 years ago
olalav posted a comment on ticket #172

I can confirm that my enclosed example doesn't scream "ARGH!!" anymore. Sounds like you had a good understanding of the problem. I'll let you know if I run into similar issues.
5 years ago
olalav modified a comment on ticket #61

:) 9d94f71 has the same problem as Feature Request #62 (which is really a Bug Report). trim() is not multibyte safe, and so trim($foo, "\xc2\xa0") removes \xc2 and \xa0 individually. See PHP manual pages for trim() for a proper solution. Implementing your own trim function may be necessary. Consider something like: $pattern = "[\t\r\n ]|(\xc2\xa0)"; $foo = " \xc2\xa0\t\rfoo\xc2\xa0 "; $foo = preg_replace("/(^$pattern)|($pattern$)/", "", $foo); echo "[$foo]\n";
5 years ago
olalav posted a comment on ticket #61

:) 9d94f71 has the same problem as Feature Request #62 (which is really a Bug Report). trim() is not multibyte safe, and so trim($foo, "\xc2\xa0") removes \xc2 and \xa0 individually. See PHP manual pages for trim() for a proper solution.
5 years ago
olalav posted a comment on ticket #62

1) It fixes the problem. 2) Your changes are not necessary, at least not for this isolated case. It's simple: The \s destroys Unicode sequences if you don't apply the u flag. You may have to do this other places as well: There are six instances of preg_replace() using \s. Some of these may deal with ASCII-only strings, though. You should know what to do :)
5 years ago
olalav modified a comment on ticket #62

index a078078..708e993 100644 --- a/simple_html_dom.php +++ b/simple_html_dom.php @@ -2218,3 +2218,3 @@ class simple_html_dom // https://www.w3.org/TR/xml/#AVNormalize - $value = preg_replace("/[\r\n\t\s]+/", ' ', $value); + $value = preg_replace("/[\r\n\t\s]+/u", ' ', $value); $value = trim($value);
5 years ago
olalav modified a comment on ticket #62

index a078078..708e993 100644 --- a/simple_html_dom.php +++ b/simple_html_dom.php @@ -2219 +2219 @@ class simple_html_dom - $value = preg_replace("/[\r\n\t\s]+/", ' ', $value); + $value = preg_replace("/[\r\n\t\s]+/u", ' ', $value);
5 years ago
olalav modified a comment on ticket #62

I found the culprit :) $value = preg_replace("/[\r\n\t\s]+/", ' ', $value); // THE PROBLEM $value = preg_replace("/[\r\n\t\s]+/u", ' ', $value); // THE SOLUTION
5 years ago
olalav modified a comment on ticket #62

Same problem. My UNIX locale settings are below, but unsetting them made no difference. My php.ini is a standard one. The problem does not occur on a Debian machine with PHP 7.0.x. LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_CTYPE=UTF-8 When doing print_r($html) and piping to less, the error is already there, in the object tree (simplified): [root] => simple_html_dom_node Object [children] => Array [0] => simple_html_dom_node Object [attr] => Array [content] => <C2> <C3> <C3> á It's curious that the 2nd...
5 years ago
olalav posted a comment on ticket #172

Noted. So remove() must be used with caution, or not at all, until further notice.
5 years ago
olalav posted a comment on ticket #62

Same problem. My UNIX locale settings are below, but unsetting them made no difference. My php.ini is a standard one. The problem does not occur on a Debian machine with PHP 7.0.x. LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_CTYPE=UTF-8 Do you think the misrepresentation happens during the building of the object tree (str_get_html) or when fetching the value (->content)?
5 years ago
olalav created ticket #172

Problem with the remove function
5 years ago
olalav modified a comment on ticket #62

OK, I found the problem. It occurs when setting a locale. You should now be able to look into why the content property is extracted incorrectly. setlocale(LC_ALL, "fr_FR.UTF-8"); $latin = utf8_encode("\xa0\xc5\xe0\xe1"); for($i=0; $i<strlen($latin); $i+=2) printf("%02x%02x ", ord($latin[$i]), ord($latin[$i+1])); echo "\n"; $string = sprintf('<meta content="%s">', $latin); $html = str_get_html($string); $content = $html->find("meta", 0)->content; for($i=0; $i<strlen($content); $i+=2) printf("%02x%02x...
5 years ago
olalav posted a comment on ticket #62

OK, I found the problem: When setting European locale, contentis extracted incorrectly. setlocale(LC_ALL, "fr_FR.UTF-8"); $latin = utf8_encode("\xa0\xc5\xe0\xe1"); for($i=0; $i<strlen($latin); $i+=2) printf("%02x%02x ", ord($latin[$i]), ord($latin[$i+1])); echo "\n"; $string = sprintf('<meta content="%s">', $latin); $html = str_get_html($string); $content = $html->find("meta", 0)->content; for($i=0; $i<strlen($content); $i+=2) printf("%02x%02x ", ord($content[$i]), ord($content[$i+1])); echo "\n";...
5 years ago
olalav posted a comment on ticket #62

UTF-8 here too. $foo = $html->find("meta", 0)->content; for($i=0; $i<strlen($foo); $i+=2) printf("%02x%02x\n", ord($foo[$i]), ord($foo[$i+1])); c220 <-- c2a1 c2a2 c2a3 ... c2bd c2be c2bf c380 c381 c382 c383 c384 c320 <-- c386 c387 c388 c389 ... c39d c39e c39f c320 <-- c3a1 c3a2 c3a3 ... c3bd c3be c3bf
5 years ago
olalav posted a comment on ticket #62

I'm using the latest Git master (0e03308). Piping to less, I get the following. Notice C2 and C3 which indicate that they are single bytes (ie. an incomplete Unicode character sequence). <C2> ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄ<C3> ÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß<C3> áâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
5 years ago
olalav modified a comment on ticket #61

I'm using the latest Git master. $html = str_get_html("Hello, World\xc2\xa0"); foreach($html->find("p") as $p) printf("[%s]\n", $p->plaintext); The output I expect is: [Hello, World] What I get is: [Hello, World] [ ] <-- nbsp inside the brackets I don't want the empty element. I want the parser to consider nbsp as space and trim it, ultimately excluding it from the array of found elements.
5 years ago
olalav modified a comment on ticket #61

I'm using the latest Git master. $html = str_get_html("Hello, World\xc2\xa0"); foreach($html->find("p") as $p) printf("[%s]\n", $p->plaintext); The output I expect is: [Hello, World] What I get is: [Hello, World] [ ] <-- nbsp inside the brackets I don't want the empty element. I want the parser to consider nbsp as space and trim it, ultimately excluding it in the array of found elements.
5 years ago
olalav modified a comment on ticket #61

I'm using the latest Git master. $html = str_get_html("Hello, World\xc2\xa0"); foreach($html->find("p") as $p) printf("[%s]\n", $p->plaintext); The output I expect is: [Hello, World] What I get is: [Hello, World] [ ] <-- nbsp inside the brackets I don't want the empty element. I want the parser to consider nbsp as space and trim it, ultimately not including it in the array of found elements.
5 years ago
olalav modified a comment on ticket #61

I'm using the latest Git master. $html = str_get_html("Hello, World\xc2\xa0"); foreach($html->find("p") as $p) printf("[%s]\n", $p->plaintext); The output I expect is: [Hello, World] What I get is: [Hello, World] [ ] <-- nbsp inside the brackets I don't want the empty element. I want the parser to consider nbsp as space and trim it, ultimately not including it in the array of found elements.
5 years ago
olalav posted a comment on ticket #61

I'm using the latest Git master. $html = str_get_html("Hello, World\xc2\xa0"); foreach($html->find("p") as $p) printf("[%s]\n", $p->plaintext); The output I expect is: [Hello, World] What I get is: [Hello, World] [ ] I don't want the empty element. I want the parser to consider nbsp as space and trim it, ultimately not including it in the array of found elements.
5 years ago
olalav created ticket #62

Unicode characters not extracted correctly
5 years ago
olalav created ticket #61

Consider nbsp to be whitespace
5 years ago
olalav posted a comment on ticket #52

$html = file_get_html("https://www.rogerebert.com/reviews/dark-phoenix-2019"); foreach($html->find("div[itemprop=reviewBody] > p") as $p) printf("%s\n\n", wordwrap($p->plaintext)); I found an incident where whitespace is not removed (marked with underscore). Can you fix this? ...and “_X-Men: Apocalypse_,” Simon Kinberg_’s directorial debut... ...Jean Grey, Professor X, Raven (_Jennifer Lawrence_)... ...named Vuk (who takes the body of Jessica Chastain_) is encouraging...
5 years ago
olalav posted a comment on ticket #52

I'm so happy with these changes. The package is now like a dream to use, because you instantly get the content you want without struggling with manual trimming and decoding every single time.
5 years ago
olalav posted a comment on ticket #52

When will everything be merged?
5 years ago
olalav posted a comment on ticket #52

Just post a notice when you're all done merging everything to master. I'm so happy this project is active. I really thought it was abandonware when I first started using it, as bug reports were years old with no response, etc. For whatever reason things have gotten back on track, I'm grateful, and happy to help. Actually, parsing HTML is so fundamental, and this library is so user-friendly, that I think it should be a integral part of PHP. (PHP doesn't have anything out of the box for this.)
5 years ago
olalav posted a comment on ticket #52

Looks like it works! Let me know when you merge things, so I don't have to choose between decoding and trimming :)
5 years ago
olalav modified a comment on ticket #52

Thanks for a good follow-up. I'll try the latest commit. I actually know one of the main W3C guys, so if necessary I could ask for his opinion on the matter. Malplaced whitespace typically ends up in HTML because of sloppiness by non-tech people, eg. a journalist copy-pasting an article title (in a proportional font) into a CMS, not noticing the minuscle leading/trailing whitespace. So you could say that this is where the trim function really belongs: At the origin of the fault. I doubt any developer...
5 years ago
olalav modified a comment on ticket #52

Thanks for a good follow-up. I'll try the latest commit. I actually know one of the main W3C guys, so if necessary I could ask for his opinion on the matter. Malplaced whitespace typically ends up in HTML because of sloppiness by non-tech people, eg. a journalist copy-pasting an article title (in a proportional font) into a CMS, not noticing the minuscle leading/trailing whitespace. So you could say that this is where the trim function really belongs: At the origin of the fault. I doubt any developer...
5 years ago
olalav posted a comment on ticket #52

Thanks for a good follow-up. I'll try the latest commit. I actually know one of the main W3C guys, so if necessary I could ask for his opinion on the matter. Malplaced whitespace typically ends up in HTML because of sloppiness by non-tech people, eg. a journalist copy-pasting an article title (in a proportional font) into a CMS, not noticing the minuscle leading/trailing whitespace. So you could say that this is where the trim function really belongs: At the origin of the fault. I doubt any developer...
5 years ago
olalav posted a comment on ticket #52

Thanks for the fix! Does the W3C HTML specification say that whitespace inside quotes are in fact part of the actual value? If so, I sort of concede, but not very happily, I must admit. Though whitespace inside quotes is no doubt due to sloppiness on the page author's part, in the real world you always want trimmed values to avoid messing up database fields, plain-text terminal output, markup, and other sources that would carry the whitespace with them. I appreciate you're trying to follow standards,...
5 years ago
olalav posted a comment on ticket #52

Any news on trimming?
5 years ago
olalav modified a comment on ticket #52

$html = " foo "; $html = str_get_html($html); $html->find("span", 0)->remove(); printf("(%s)\n", $html->find("p", 0)->plaintext); $html = str_get_html(' <meta name="description" content=" bar ">'); printf("(%s)\n", $html->find("meta[name=description]", 0)->content);
5 years ago
olalav modified a comment on ticket #52

$html = " <figure> </figure> foo "; $html = str_get_html($html); $html->find("figure", 0)->remove(); printf("(%s)\n", $html->find("p", 0)->plaintext); $html = str_get_html(' <meta name="description" content=" bar ">'); printf("(%s)\n", $html->find("meta[name=description]", 0)->content);
5 years ago
olalav posted a comment on ticket #52

$html = " < figure > < /figure > foo "; $html = str_get_html($html); $html->find("figure", 0)->remove(); printf("(%s)\n", $html->find("p", 0)->plaintext); $html = str_get_html(' < meta name = " description " content = " bar " > '); printf("(%s)\n", $html->find("meta[name=description]", 0)->content);
5 years ago
olalav posted a comment on ticket #52

Ever thought about migrating the whole thing to Github? Sourceforge feels kind of outdated, though there may be aspects of this I don't know about...
5 years ago
olalav posted a comment on ticket #167

I did a couple of quick tests and it seems to work as expected. Looks like it didn't require much coding either, so all good and everyone's happy! I'll let you know if anything breaks.
5 years ago
olalav modified a comment on ticket #52

This is just not my day: error: the requested upstream branch 'origin/EntityDecoding' does not exist. Starting over seemed to work better: $ git clone git://git.code.sf.net/p/simplehtmldom/repository $ cd repository $ git checkout EntityDecoding $ git fetch --all On a side note, it would be good if the basename of the URL was simple_html_dom or simplehtmldom (whichever is the official), rather than repository.
5 years ago
olalav modified a comment on ticket #52

This is just not my day: error: the requested upstream branch 'origin/EntityDecoding' does not exist. I started over, which seemed to work better: $ git clone git://git.code.sf.net/p/simplehtmldom/repository $ cd repository $ git checkout EntityDecoding $ git fetch --all On a side note, it would be good if the basename of the URL was simple_html_dom or simplehtmldom (whichever is the official), rather than repository.
5 years ago
olalav posted a comment on ticket #52

This is just not my day: error: the requested upstream branch 'origin/EntityDecoding' does not exist. I also tried starting over, to no avail: $ git clone git://git.code.sf.net/p/simplehtmldom/repository $ cd repository $ git fetch --all $ git branch * master On a side note, it would be good if the basename of the URL was simplehtmldom or simple_html_dom, rather than repository.
5 years ago
olalav posted a comment on ticket #52

I still had to do git pull origin EntityDecoding. Maybe this has something to do with .gitconfig and definition of remotes.
5 years ago
olalav posted a comment on ticket #52

Looks fine now. Post a notice when you have trimming in place.
5 years ago
olalav modified a comment on ticket #52

Trimming doesn't seem to take place. You wrote " it [trimming] needs to be applied before decoding". Does this mean it's not implemented yet?
5 years ago
olalav posted a comment on ticket #52

Found some things that are not decoded as expected: $html = str_get_html('<meta name="description" content="Häagen-Dazs">'); echo $html->find("meta[name=description]", 0)->content . "\n"; echo $html->find("meta[name=description]", 0)->getAttribute("content") . "\n"; Results in: Häagen-Dazs Häagen-Dazs
5 years ago
olalav modified a comment on ticket #52

Very exciting! Will try this. I wouldn't worry about edge cases like &amp; for normal use. The only relevant case seems to be markup that is verbatimely referring to entities.
5 years ago
olalav posted a comment on ticket #52

It doesn't seem to break any of my scripts. However, trimming doesn't occur. You wrote " it [trimming] needs to be applied before decoding". Does this mean it's not implemented yet?
5 years ago
olalav posted a comment on ticket #52

I had to do git checkout -b EntityDecoding and then git pull origin EntityDecoding. Let me know if there's an easier way to pull a branch.
5 years ago
olalav posted a comment on ticket #52

PS! Which Git commands do I use to get this branch/commit? A normal pull gets just the master branch, and there is no such commit there.
5 years ago
olalav modified a comment on ticket #52

Very exciting! Will try this. I wouldn't worry about edge cases like &amp; for normal use. Hope other people will test it too and shed light on problems that are likely, if any.
5 years ago
olalav modified a comment on ticket #52

Very exciting! Will try this. I wouldn't worry about edge cases like &amp; for normal use. Hope other people will test it too and shed light on problems that are likely, if any. PS! Which Git commands do I use to get this branch/commit? A normal pull gets just the master branch, and there is no such commit there.
5 years ago
olalav modified a comment on ticket #52

Very exciting! Will try this and let you know how it works for me. Stuff like &amp; are very unlikely edge cases I wouldn't worry about for normal use. Hope other people will test it too and shed light on problems that are likely, if any. PS! Which Git commands do I use to get this branch/commit? A normal pull gets just the master branch, and there is no such commit there.
5 years ago
olalav posted a comment on ticket #52

Very exciting! Will try this and let you know how it works for me. Stuff like &amp; are very unlikely edge cases I wouldn't worry about for normal use. Hope other people will test it too and shed light on problems that are likely, if any.
5 years ago
olalav posted a comment on ticket #52

I doubt a change in performance would have any significant real world impact. I think the correct thing is to always decode and trim. Not doing is relying too much on the HTML, which will break in other ways if the HTML changes. As for breaking code, decoding a string that is already decoded will practically always return the same string. So I think you could actually get away with just changing the code. Unless a lot of people really expect a lot of non-trimmed, non-decoded strings. I would shout...
5 years ago
olalav posted a comment on ticket #52

Actually, trim() would also be a desired default. I have never needed not to remove surrounding whitespace from an accessed value (unless when assuming/hoping there's never going to be any).
5 years ago
olalav created ticket #52

Always decode content values from the DOM tree
6 years ago
olalav posted a comment on ticket #168

Works now. Either it was the limit, or the source code changed. Will let you know if something similar happens.
6 years ago
olalav created ticket #168

Wikipedia breaks the parser
6 years ago
olalav created ticket #167

Removed elements aren't properly removed
6 years ago
olalav posted a comment on ticket #163

I think what I mostly would expect is the plain text to look like the text as displayed in the browser, in other words, a single whitespace no matter how many whitespaces are in the source. Possibly, there may be instances where multiplace whitespace are desired (like you hint at), but I can't think of any at the moment. Of course the replace will work around the problem.
6 years ago
olalav posted a comment on ticket #163

Seems to work! However, the example below creates multiple whitespace where there should be only one. I don't know if this is a different bug, but it's at least somewhat related. $str = 'I am saying  <a href=""> Hello World </a>  to you.'; $html = str_get_html($str); echo $html->find("p", 0)->plaintext . "\n"; I am saying Hello World to you.
6 years ago
olalav posted a comment on ticket #164

Under "Files", the latest version (1.8.1) of simplehtmldom is dated 2019-01-13 (three weeks old), which seemed outdated. So I did svn checkout https://svn.code.sf.net/p/simplehtmldom/code/trunk simplehtmldom-code, which I assumed would give me the latest version. I now see that this gave me an ancient version @version 1.5 ($Rev: 210 $), for unknown reasons. I also now noticed that there is a Git repository, which puzzles me, as I thought SourceForge had no affiliation with Git. When cloning this,...
6 years ago
olalav created ticket #164

Fatal error: Stream does not support seeking
6 years ago
olalav posted a comment on ticket #163

Similarly, the following produces "World.Hello". Should produce "World. Hello". $file = '<a href="">World. </a>Hello';
6 years ago
olalav created ticket #163

Missing whitespace in plaintext property
8 years ago
olalav posted a comment on ticket #118

Isn't there a profiling/kernel expert on the team or somewhere in the community?...
8 years ago
olalav created ticket #118

Accessible and fast CLI-mode
8 years ago
olalav created ticket #157

$html->find("*") does not find all tags
1 decade ago
olalav created ticket #556

Cursor starts running off / Unable to reset emulator
1 decade ago
olalav posted a comment on ticket #29

My mistake this time. I forgot ./configure --with-readline. Now sqsh works with cursor...

1 >

olalav Activity

Activity for olalav

olalav created ticket #67

olalav created ticket #205

olalav posted a comment on ticket #203

olalav posted a comment on ticket #204

olalav created ticket #204

olalav posted a comment on ticket #201

olalav created ticket #203

olalav created ticket #201

olalav modified a comment on ticket #199

olalav posted a comment on ticket #199

olalav posted a comment on ticket #199

olalav posted a comment on ticket #199

olalav posted a comment on ticket #199

olalav created ticket #199

olalav posted a comment on ticket #193

olalav modified a comment on ticket #193

olalav posted a comment on ticket #193

olalav posted a comment on ticket #186

olalav created ticket #193

olalav created ticket #186

olalav posted a comment on ticket #63

olalav posted a comment on ticket #65

olalav created ticket #65

olalav created ticket #64

olalav posted a comment on ticket #63

olalav posted a comment on ticket #61

olalav posted a comment on ticket #62

olalav posted a comment on ticket #46

olalav created ticket #46

olalav created ticket #63

olalav posted a comment on ticket #62

olalav posted a comment on ticket #172

olalav modified a comment on ticket #61

olalav posted a comment on ticket #61

olalav posted a comment on ticket #62

olalav modified a comment on ticket #62

olalav modified a comment on ticket #62

olalav modified a comment on ticket #62

olalav modified a comment on ticket #62

olalav posted a comment on ticket #172

olalav posted a comment on ticket #62

olalav created ticket #172

olalav modified a comment on ticket #62

olalav posted a comment on ticket #62

olalav posted a comment on ticket #62

olalav posted a comment on ticket #62

olalav modified a comment on ticket #61

olalav modified a comment on ticket #61

olalav modified a comment on ticket #61

olalav modified a comment on ticket #61

olalav posted a comment on ticket #61

olalav created ticket #62

olalav created ticket #61

olalav posted a comment on ticket #52

olalav posted a comment on ticket #52

olalav posted a comment on ticket #52

olalav posted a comment on ticket #52

olalav posted a comment on ticket #52

olalav modified a comment on ticket #52

olalav modified a comment on ticket #52

olalav posted a comment on ticket #52

olalav posted a comment on ticket #52

olalav posted a comment on ticket #52

olalav modified a comment on ticket #52

olalav modified a comment on ticket #52

olalav posted a comment on ticket #52

olalav posted a comment on ticket #52

olalav posted a comment on ticket #167

olalav modified a comment on ticket #52

olalav modified a comment on ticket #52

olalav posted a comment on ticket #52

olalav posted a comment on ticket #52

olalav posted a comment on ticket #52

olalav modified a comment on ticket #52

olalav posted a comment on ticket #52

olalav modified a comment on ticket #52

olalav posted a comment on ticket #52

olalav posted a comment on ticket #52