Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

7 StackOverflowError shouldn't kill crawl - ID: 1093073
Last Update: Comment added ( karl-ia )

Seen in FR crawl:

<<<
java.lang.StackOverflowError
#8
http://www.gltrade.fr/gltcom/servlet/com.gltrade.gltcom.menu.GlTradeQuoteSe
rvlet?menu=investors
(0 attempts)
REL
Current processor: ExtractorHTML
ACTIVE for 9s498ms
Where: ABOUT_TO_BEGIN_PROCESSOR for 1460ms

java.lang.StackOverflowError
at
java.lang.Character.codePointAt(Character.java:2336)
at
java.util.regex.Pattern$BitClass.match(Pattern.java:2873)
at
java.util.regex.Pattern$Sub.match(Pattern.java:5246)
at
java.util.regex.Pattern$Sub.match(Pattern.java:5246)
at
java.util.regex.Pattern$Sub.match(Pattern.java:5246)
at
java.util.regex.Pattern$Sub.match(Pattern.java:5246)
at
java.util.regex.Pattern$Curly.match(Pattern.java:4189)
at
java.util.regex.Pattern$Single.match(Pattern.java:3314)
at
java.util.regex.Pattern$GroupHead.match(Pattern.java:4569)
at
java.util.regex.Pattern$Loop.match(Pattern.java:4696)
at
java.util.regex.Pattern$GroupTail.match(Pattern.java:4628)
at
java.util.regex.Pattern$Curly.match0(Pattern.java:4234)
at
java.util.regex.Pattern$Curly.match(Pattern.java:4196)
at
java.util.regex.Pattern$Single.match(Pattern.java:3314)
at
java.util.regex.Pattern$GroupHead.match(Pattern.java:4569)
at
java.util.regex.Pattern$Loop.match(Pattern.java:4696)
at
java.util.regex.Pattern$GroupTail.match(Pattern.java:4628)
at
java.util.regex.Pattern$Curly.match0(Pattern.java:4234)
at
java.util.regex.Pattern$Curly.match(Pattern.java:4196)
at
java.util.regex.Pattern$Single.match(Pattern.java:3314)
at
java.util.regex.Pattern$GroupHead.match(Pattern.java:4569)
at
java.util.regex.Pattern$Loop.match(Pattern.java:4696)
at
java.util.regex.Pattern$GroupTail.match(Pattern.java:4628)
at
java.util.regex.Pattern$Curly.match0(Pattern.java:4234)
at
java.util.regex.Pattern$Curly.match(Pattern.java:4196)
at
java.util.regex.Pattern$Single.match(Pattern.java:3314)
[plus lots more, but not all the way to see which
regexp was being applied]

Could NOT reproduce with current version of page on
live web, even though it does have some atypically
long/intricate <param> tags.

StackOverflowError caused crawl to pause, as it was
treated same as OOME. But, this error should usually be
recoverable, only harming the current thread/URI. So,
crawl should shake it off like other RuntimeExceptions.


Gordon Mohr ( gojomo ) - 2004-12-30 02:29

7

Closed

Fixed

Gordon Mohr

None

None

Public


Comments ( 2 )

Date: 2007-03-14 00:19
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-322 -- please add further
comments at that location.


Date: 2005-02-15 02:43
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Treating StackOverflowError more like RuntimeException --
will ruin processing of current URI, but not kill crawl.
Commit message (from a while ago):

Help for [ 1093073 ] StackOverflowError in ExtractorHTML
* ToeThread.java
Treat StackOverflowError more like a RuntimeException.
as it is usually recoverable. Also annotate URI with
error/exception that triggered a catchall handler.




Attached File

No Files Currently Attached

Changes ( 4 )

Field Old Value Date By
status_id Open 2005-02-16 02:16 gojomo
resolution_id None 2005-02-16 02:16 gojomo
close_date - 2005-02-16 02:16 gojomo
summary StackOverflowError in ExtractorHTML 2005-02-15 02:43 gojomo