ExtractorHTML.processScriptCode() converts a
CharSequence into a String before passing to
ExtractorJS.considerStrings() -- even though the latter
is perfectly happy with a CharSequence.
Noticed because on a crawl which encountered a 1+MB
obfuscated javascript segment, this conversion
triggered an OOM. (NARA-MIL test crawl). An OOM might
have been inevitable, but this attempted allocation of
a 2+MB (at 2 bytes per character) temporary String
didn't help.
Fix is just to not convert to a String.
Treating as high-priority, low-risk fix for 1.0.x.
Gordon Mohr
None
None
Public
|
Date: 2007-03-14 00:16
|
|
Date: 2004-10-13 02:18 Logged In: YES |
| Field | Old Value | Date | By |
|---|---|---|---|
| status_id | Open | 2004-10-13 02:18 | gojomo |
| resolution_id | None | 2004-10-13 02:18 | gojomo |
| close_date | - | 2004-10-13 02:18 | gojomo |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use