In the web tester, assertWantedText() will match text which appears in javascript source code. See attached zip for example testcase, which should be unzipped into the root of your web server to run.
I would argue that anything inside <script></script> is not "browser visible text" and therefore should not be returned.
I would propose modifying SimpleHtmlSaxParser::normalise( ) to the following:
function normalise($html) {
$text = preg_replace('|<!--.*?-->|s', '', $html);
$text = preg_replace('|<script.*? >.*?</script>|s', '', $text);
$text = preg_replace('|<img.*?alt\s*=\s*"(.*?)".*?>|s', ' \1 ', $text);
$text = preg_replace('|<img.*?alt\s*=\s*\'(.*?)\'.*?>|s', ' \1 ', $text);
$text = preg_replace('|<img.*?alt\s*=\s*([a-zA-Z_]+).*?>|s', ' \1 ', $text);
$text = preg_replace('|<.*?>|s', '', $text);
$text = SimpleHtmlSaxParser::decodeHtml($text);
$text = preg_replace('|\s+|', ' ', $text);
return trim($text);
}
*NOTE*
as well as stripping the contents of <script> tags I also suggest that it would be good to use the 's' modifer on these preg_replace calls. This causes the '.' in regexes to match newlines as well, and so caters for cases such as:
i) img tags spanning multiple lines
ii) HTML comments spanning multiple lines
iii) stripping out other tags spanning multiple lines
I have simpletest_1.0.1beta.tar.gz (parser.php rev 1.66)
Best
David Heath
Perrick Penet
Web tester
None
Public
|
Date: 2007-12-23 09:11:04 PST
|
|
Date: 2007-03-01 02:05:50 PST
|
| Filename | Description | Download |
|---|---|---|
| spelling_example.zip | This test unexpectedly passes on current simpletest version | Download |