Menu

#191   and UTF-8 encoding

open
Web tester (52)
5
2009-12-27
2009-12-27
Anonymous
No

"assertText()" with   between Cyrillic symbols symbol failed on Windows (but worked on Linux).
Error message shows a broken string with malformed UTF-8 chars.
Problem code (I downloaded project trunk, because 1.0.1 version performs identically bad):

class SimplePage {
static function normalise($html) {
//Line 538:
$text = preg_replace('#\s+#', ' ', $text);

1) I added modifier to '#\s+#u' as quick fix for my   problem. But
class TestOfLiveBrowser extends UnitTestCase {
function testRelativeEncodedLinkFollowing() {
now fails. So the problem with whitespace in different charsets is deeper.
There are other places in code where string operations nor binary safe, nor charset-aware.

Discussion

  • Nobody/Anonymous

    $text = preg_replace('#[\040\n\r\t]+#', ' ', $text);
    is a quick fix to pass tests

     
  • truetype76

    truetype76 - 2010-04-28

    Problem may come from the parser.php - SimpleHtmlSaxParser::decodeHtml() method run with ISO-8859-1.
    http://de2.php.net/manual/en/function.html-entity-decode.php
    "The ISO-8859-1 character set is used as default for the optional third charset. This defines the character set used in conversion."

    Solution may come by passing a character_set into the parser.php. But where should that come from? From the reporter.php? From the php.ini? Best solution would of course be to parse it from the page under test...

     

Log in to post a comment.

MongoDB Logo MongoDB