"assertText()" with between Cyrillic symbols symbol failed on Windows (but worked on Linux).
Error message shows a broken string with malformed UTF-8 chars.
Problem code (I downloaded project trunk, because 1.0.1 version performs identically bad):
class SimplePage {
static function normalise($html) {
//Line 538:
$text = preg_replace('#\s+#', ' ', $text);
1) I added modifier to '#\s+#u' as quick fix for my problem. But
class TestOfLiveBrowser extends UnitTestCase {
function testRelativeEncodedLinkFollowing() {
now fails. So the problem with whitespace in different charsets is deeper.
There are other places in code where string operations nor binary safe, nor charset-aware.
$text = preg_replace('#[\040\n\r\t]+#', ' ', $text);
is a quick fix to pass tests
Problem may come from the parser.php - SimpleHtmlSaxParser::decodeHtml() method run with ISO-8859-1.
http://de2.php.net/manual/en/function.html-entity-decode.php
"The ISO-8859-1 character set is used as default for the optional third charset. This defines the character set used in conversion."
Solution may come by passing a character_set into the parser.php. But where should that come from? From the reporter.php? From the php.ini? Best solution would of course be to parse it from the page under test...