SimpleTest / Bugs / #191   and UTF-8 encoding

#191   and UTF-8 encoding

Status: open

Owner: Marcus Baker

Labels: Web tester (52)

Priority: 5

Updated: 2009-12-27

Created: 2009-12-27

Creator: Anonymous

Private: No

"assertText()" with   between Cyrillic symbols symbol failed on Windows (but worked on Linux).
Error message shows a broken string with malformed UTF-8 chars.
Problem code (I downloaded project trunk, because 1.0.1 version performs identically bad):

class SimplePage {
static function normalise($html) {
//Line 538:
$text = preg_replace('#\s+#', ' ', $text);

1) I added modifier to '#\s+#u' as quick fix for my   problem. But
class TestOfLiveBrowser extends UnitTestCase {
function testRelativeEncodedLinkFollowing() {
now fails. So the problem with whitespace in different charsets is deeper.
There are other places in code where string operations nor binary safe, nor charset-aware.

Discussion

Nobody/Anonymous - 2009-12-27

$text = preg_replace('#[\040\n\r\t]+#', ' ', $text);
is a quick fix to pass tests

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

truetype76 - 2010-04-28

Problem may come from the parser.php - SimpleHtmlSaxParser::decodeHtml() method run with ISO-8859-1.
http://de2.php.net/manual/en/function.html-entity-decode.php
"The ISO-8859-1 character set is used as default for the optional third charset. This defines the character set used in conversion."

Solution may come by passing a character_set into the parser.php. But where should that come from? From the reporter.php? From the php.ini? Best solution would of course be to parse it from the page under test...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

and UTF-8 encoding

Group

Searches

Help

#191   and UTF-8 encoding

Discussion

&nbsp; and UTF-8 encoding

Group

Searches

Help

#191 &nbsp; and UTF-8 encoding

Discussion

and UTF-8 encoding

#191 and UTF-8 encoding