ExtractorHTML when given pathological input sometimes
creates Strings from CharSequences that are
impractically long.
For example, this page:
http://blueliners.com.au/guestbook/guestbook.html
... had essentially a...
<a href="http://something[23MB of \0
characters]blahblah">
This 23MB CharSequence, passed as 'value' into
processLink(), was then becoming a String instance
(~46MB in size) in its "TextUtils.replaceAll()"
amp-escaping.
Generally, every CharSequence.toString(),
Matcher.group(), and TextUtils.replaceAll() in
ExtractorHTML creates a String, and we should take care
not to create Strings from excessively long junk input.
The regexps can be tightened so that where they would
have taken '+' or '*' they instead take '{1,N}' or
'{0,N}', where N is an appropriate maximum value for
the context.
Karl Thiessen
None
1.6.0
Public
|
Date: 2007-03-14 00:55
|
|
Date: 2005-06-15 22:42 Logged In: YES |
|
Date: 2005-06-14 23:16 Logged In: YES |
|
Date: 2005-06-14 20:39 Logged In: YES |
| Field | Old Value | Date | By |
|---|---|---|---|
| status_id | Open | 2005-12-02 17:14 | stack-sf |
| close_date | - | 2005-12-02 17:14 | stack-sf |
| artifact_group_id | None | 2005-09-23 18:29 | gojomo |
| assigned_to | gojomo | 2005-06-15 22:42 | gojomo |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use