The problem is in '<-' + '/' symbols combination in string
Example HTML page:
<html><body>
<div>
<a>
<span> ---> Lorem ipsum <--- dolor sit amet / at volutpat </span>
<span> Lorem ipsum dolor sit amet </span>
<span class="foo_1">Bar 1</span>
</a>
<div class="foo_2">Bar 2</div>
</div>
<span class="foo_3">Bar 3</span>
</body></html>
Test script (example page stored in $data):
$dom = str_get_html($data);
$tmp = $dom->find('div/span', 0);
var_dump($tmp->plaintext);
Result:
string(80) " ---> Lorem ipsum Lorem ipsum dolor sit amet Bar 1 </a> Bar 2 "
Tested on:
simplehtmldom v.1.5 rev 196 & 210
PHP 5.6.17
my fast solution is to replace '/' with '|'
Last edit: Alex Kozlovsky 2017-02-09
Thanks for reporting this issue. I've added a test to check for this behavior in future. Please notice that "<" is invalid text according to https://validator.w3.org/#validate_by_input
It correctly suggests escaping
<to<, which solves your issue.That being said, the parser now considers tags starting with "<-" invalid (as does the HTML Specification). These tags are now correctly added as text. Let me know if you experience further issues.
Fixed via [b1bade]
Related
Commit: [b1bade]