Attached is a patch that enhances the HtmlElment.asText() function to
produce results that are both closer to what a real web browser produces
and easier to write tests against. The new code does two extra things:
- translates non-breaking spaces (character 160) to regular spaces=20
- removes extra whitespace (all strings of whitespace are reduced to a
single space)
However, the changes will cause issues for existing code that relies on
the old behavior.
=20
Mike Bresnahan
|