I think the issue is finally clear to me.
At first I thought you were searching through HTML like this
<divclass="main"class="article"></div>
but you are talking about multiple classes, not multiple class attributes, so the code probably looks like this (please correct me if I'm wrong)
<divclass="main article"></div>
In this case, the selector 'div[class="main"]' is not what you really want. You want to match classes that contain the specified value and not the ones that match exactly. If it works in 1.7 that was a bug.
Generally speaking, for classes you should use 'div[class~="main"]' as it takes whitespace into consideration. Please note that it doesn't work for multiple classes (because of the whitespace).
<?phprequire_once'simple_html_dom.php';$html=str_get_html(<<<EOD<body><div class="main header section"></div><div class="mainnot"></div></body>EOD);// "=" matches the value **exactly**echo'Match "=": ';echocount($html->find('div[class="main"]'));echoPHP_EOL;// "*=" matches if it **contains** the valueecho'Match "*=": ';echocount($html->find('div[class*="main"]'));echoPHP_EOL;// Note that this also matches <div class="mainnot"></div>// "^=" matches if it **starts** with the valueecho'Match "^=": ';echocount($html->find('div[class^="main"]'));echoPHP_EOL;// Note that this also matches <div class="mainnot"></div>// "~=" matches if it **contains** the value as whitespace separated listecho'Match "~=": ';echocount($html->find('div[class~="main"]'));echoPHP_EOL;
Does that work for you?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
gotcha, I'm testing both master and 7.1, But one thing, this used to work on previous versions, in my case I had to upgrade the simplehtmldom library because of a PHP bump to 7.3.X
Let me get back to you with the results.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1.8.1 As you mentioned, div[class~="main"] is the one that would resolve the problems I was having (going from 1.5 and prior versions up to master).
Also, I can confirm echo $html->find('div[class="main"]', 0)->innertext; works on 1.7, if that's a bug now, please confirm it, and then I guess we're good to close this case.
I appreciate a lot the time you spent to check this out. :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Also, I can confirm echo $html->find('div[class="main"]', 0)->innertext; works on 1.7, if that's a bug now, please confirm it, and then I guess we're good to close this case.
It is a bug in version 1.7 and prior. The CSS specification is very clear about it.
I suppose the best way to confirm this is to load a HTML document with CSS styles. Here is one example.
<head><style>div[class=main]{color:white;}div[class~=main]{background-color:blue;}</style></head><body><divclass="main header section">PHP Simple HTML DOM Parser</div><divclass="mainnot">Hello, World!</div></body>
As you can see, the first selector uses the original solution which worked in 1.7 and prior. It sets the text color to white. The second selector sets the background color to blue. On my machine only the second selector works.
It looks something like this (only renders on SF)
PHP Simple HTML DOM Parser
Hello, World!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The reason why the first two examples work and the third one doesn't is because the id ends on an 's', which is incorrectly detected as the case-sensitivity specifier. https://www.w3.org/TR/selectors-4/#attribute-case
This certainly needs fixing, thanks for reporting it!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for reporting this issue.
It could be related to #166, which was fixed recently. Does it still happen with master?
Does this still happen with master?
Hi, sorry, I was out on vacations, I'll test and reply back with the results.
Any news on this?
I can confirm that this still is an issue.
I've tried 1.8.1, 1.7, and the latests Master and in all cases I can't get the div when it has more than one class.
Hello, try 1.7.1, that one worked for me. and yes @LogMANOriginal it's still buggy using master.
There is no 1.7.1 that I can find. There's 1.7 and I've tried it; doesn't work unfortunately.
It seems that 1.7 does work afterall... I guess in my haste in testing I probably forgot to change the file path.
Thanks!
No longer relevant.
Last edit: DB1 2019-05-25
I think the issue is finally clear to me.
At first I thought you were searching through HTML like this
but you are talking about multiple classes, not multiple class attributes, so the code probably looks like this (please correct me if I'm wrong)
In this case, the selector
'div[class="main"]'is not what you really want. You want to match classes that contain the specified value and not the ones that match exactly. If it works in 1.7 that was a bug.Generally speaking, for classes you should use
'div[class~="main"]'as it takes whitespace into consideration. Please note that it doesn't work for multiple classes (because of the whitespace).Find more details here
Here is some code for testing
Does that work for you?
gotcha, I'm testing both master and 7.1, But one thing, this used to work on previous versions, in my case I had to upgrade the simplehtmldom library because of a PHP bump to 7.3.X
Let me get back to you with the results.
1.8.1 As you mentioned,
div[class~="main"]is the one that would resolve the problems I was having (going from 1.5 and prior versions up to master).Also, I can confirm
echo $html->find('div[class="main"]', 0)->innertext;works on 1.7, if that's a bug now, please confirm it, and then I guess we're good to close this case.I appreciate a lot the time you spent to check this out. :)
It is a bug in version 1.7 and prior. The CSS specification is very clear about it.
I suppose the best way to confirm this is to load a HTML document with CSS styles. Here is one example.
As you can see, the first selector uses the original solution which worked in 1.7 and prior. It sets the text color to white. The second selector sets the background color to blue. On my machine only the second selector works.
It looks something like this (only renders on SF)
On this page: https://www.npostart.nl/andere-tijden/VPWON_1247337/episode
I can do the following in 1.7:
foreach($html->find('div[id=component-grid-episodes] div[class=npo-grid-asset] .npo-asset-tile-container') as $episode)It correctly grabs all episode divs that way.
But when I change the last part to:
div[class~=npo-asset-tile-container]it only grabs the first div and not all of them.
What am I doing wrong?
That is an actual bug in 1.8!
Here is some example code which shows versions that work and some that don't
The reason why the first two examples work and the third one doesn't is because the id ends on an 's', which is incorrectly detected as the case-sensitivity specifier. https://www.w3.org/TR/selectors-4/#attribute-case
This certainly needs fixing, thanks for reporting it!
Wow, I am sure glad it was a bug cause I was slowly getting convinced I was going crazy. :p
Glad to help! :D
This is fixed in master
[680b45]
Related
Commit: [680b45]
Changing this line
https://sourceforge.net/p/simplehtmldom/repository/ci/1.8.1/tree/simple_html_dom.php#l1188
to
should fix it (notice the
\s+?([iIsS])instead of\s*?([iIsS])at the end)I'll push this fix to master later this week.
Awesome explanation, now the issue is clear, we'll update stuff accordingly.
🙏Thanks again!!!
dup text
Last edit: Luis Franco 2019-05-28