Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

5 NPE in ROS#record - ID: 1101831
Last Update: Comment added ( karl-ia )

Kris ran an overnight crawl and got two NPEs in
ROS#record. He was using a recent HEAD.

Indeed I shall. I ran a crawl with 2200 domains
overnight. Come this morning it was still progressing
normally, having downloaded over 1.7 million documents.
Basically, it went great. I'll definately use larger
'chunks' during the next round of .is harvesting.

The only encountered errors where 2 alerts like this one:

Problem java.lang.NullPointerException occured when
trying to process
'http://www.heildsoluapotek.is/ecweb/upplysingar/?cat_id=18621&ew_2_r_f=7&e
w_2_r_t=12&news_category_id='
at step ABOUT_TO_BEGIN_PROCESSOR


Associated Throwable: java.lang.NullPointerException

Stacktrace:
java.lang.NullPointerException
at
org.archive.io.RecordingOutputStream.record(RecordingOutputStream.java(Comp
iled
Code))
at
org.archive.io.RecordingOutputStream.write(RecordingOutputStream.java(Inlin
ed
Compiled Code))
at
org.archive.io.RecordingInputStream.read(RecordingInputStream.java(Compiled

Code))
at
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java(Compil
ed
Code))
at
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java(Compiled
Code))
at
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java(C
ompiled
Code))
at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnec
tionAdapter.readLine(MultiThreadedHttpConnectionManager.java(Compiled
Code))
at
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.
java(Compiled
Code))
at
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.ja
va(Compiled
Code))
at
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java(In
lined
Compiled Code))
at
org.archive.httpclient.HttpRecorderGetMethod.execute(HttpRecorderGetMethod.
java(Compiled
Code))
at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMetho
dDirector.java(Compiled
Code))
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDi
rector.java(Compiled
Code))
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java(Comp
iled
Code))
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java(Comp
iled
Code))
at
org.archive.crawler.fetcher.FetchHTTP.innerProcess(FetchHTTP.java(Compiled
Code))
at
org.archive.crawler.framework.Processor.process(Processor.java(Compiled
Code))
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java(Comp
iled
Code))
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java(Compiled
Code))

Both occured fairly early on in the crawl (nothing like
this happened during the 1000 domain test crawl).
Non-fatal, but new to me.


Michael Stack ( stack-sf ) - 2005-01-13 18:17

5

Closed

Duplicate

Nobody/Anonymous

None

None

Public


Comments ( 2 )

Date: 2007-03-14 00:20
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-335 -- please add further
comments at that location.


Date: 2005-02-09 19:41
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Closing as a duplicate of

[ 1040212 ] ROS NPE/IOE null disk file (Keep an eye on this).

http://sourceforge.net/tracker/index.php?func=detail&aid=1040212&group_id=73833&atid=539099


Attached File

No Files Currently Attached

Changes ( 3 )

Field Old Value Date By
close_date - 2005-02-09 19:41 stack-sf
status_id Open 2005-02-09 19:41 stack-sf
resolution_id None 2005-02-09 19:41 stack-sf