Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

5 Localize StackOverflowError in Extractors - ID: 1122836
Last Update: Comment added ( karl-ia )

[ 1093073 ] StackOverflowError kills crawl
changed StackOverflowErrors to be recoverable, only
spoiling current URL.

As they commonly occur on idiosyncratic content that
troubles our Extractors, the effect of a
StackOverflowError could be localized even further,
only hurting the current extractor, still allowing the
URL to be written to an ARC.


Gordon Mohr ( gojomo ) - 2005-02-15 02:45

5

Closed

Fixed

Nobody/Anonymous

None

1.6.0

Public


Comments ( 2 )

Date: 2007-03-14 00:21
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-360 -- please add further
comments at that location.


Date: 2005-09-22 23:03
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Local handler put in place via new common superclass for
Extractors. Commit comment:

Fix for [ 1122836 ] Localize StackOverflowError in Extractors
* CrawlURI.java, LocalizedError.java, LocalErrorFormatter.java
generalize LocalizedError handling to accomodate any
Throwable (not just Exceptions)
* Extractor.java
new common superclass for Extractor Processors;
currently just wraps an extractor-specific extract() method
in a StackOverflowError catch/log/proceed handler
* ExtractorCSS.java, ExtractorDOC.java, ExtractorHTML.java,
ExtractorHTMLTest.java, ExtractorJS.java, ExtractorPDF.java,
ExtractorSWF.java, ExtractorUniversal.java
derive from new Extractor superclass; rename previous
innerProcess() to extractor-specific extract() method

Using example URI from [ 1122839 ] StackOverflowError in
ExtractorHTML, I verified desired behavior: URI processing
continued, but local-errors.log and crawl.log highlighted
the local StackOverflowError which occurred.

Closing as fixed.


Attached File

No Files Currently Attached

Changes ( 5 )

Field Old Value Date By
artifact_group_id None 2005-09-23 18:01 gojomo
status_id Open 2005-09-22 23:03 gojomo
resolution_id None 2005-09-22 23:03 gojomo
close_date - 2005-09-22 23:03 gojomo
summary Localize StackOverflowError in Extractor even further 2005-02-15 02:56 gojomo