Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

9 [Flash] OOMEs on a particular URL - ID: 1068370
Last Update: Comment added ( karl-ia )

This below will cause us to OOME. Need defense against
such beasties.

http://www.alcoholfreechildren.org/en/news/news_scroller6.swf

<<<
java.lang.OutOfMemoryError: Java heap space
#1
http://www.alcoholfreechildren.org/en/news/news_scroller6.swf
(2 attempts)

Current processor: ExtractorSWF
ACTIVE for 2s166ms
Where: ABOUT_TO_BEGIN_PROCESSOR for 1892ms

java.lang.OutOfMemoryError: Java heap space
>>>


Michael Stack ( stack-sf ) - 2004-11-18 00:18

9

Closed

Fixed

Nobody/Anonymous

Extraction

None

Public


Comments ( 2 )

Date: 2007-03-14 00:18
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-294 -- please add further
comments at that location.


Date: 2005-02-15 23:39
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Parsing the 'bad' swf file, we were trying to allocate a
byte array of hundreds of megabytes. I assume this swf file
is corrupt. Whatever, added protection against large
allocations. Also added ugly override to stamp out even
uglier System.out.println glyph count etc., messages that
we've see in heritrix_out.log.

Below is the commit.

Closing.


Fix for '[ 1068370 ] [Flash] OOMEs on a particular URL' and
stopped the
'glyph count..' message from showing in heritrix_out.log.
* src/java/org/archive/crawler/extractor/CrawlUriSWFAction.java
* src/java/org/archive/crawler/extractor/ExtractorHTML.java
* src/java/org/archive/crawler/extractor/ExtractorJS.java
Formatting.
* src/java/org/archive/crawler/extractor/ExtractorSWF.java
Formatting. Added override of SWFReader#readOneTag so
could add test
for a length thats too big.
(getTagParser): Added. Method holds override of
TagParser#parseDefineFont2
so can remove an errant System.out.println.






Attached File

No Files Currently Attached

Changes ( 4 )

Field Old Value Date By
resolution_id None 2005-02-15 23:39 stack-sf
close_date - 2005-02-15 23:39 stack-sf
status_id Open 2005-02-15 23:39 stack-sf
priority 5 2005-02-11 23:41 gojomo