Kris ran an overnight crawl and got two NPEs in
ROS#record. He was using a recent HEAD.
Indeed I shall. I ran a crawl with 2200 domains
overnight. Come this morning it was still progressing
normally, having downloaded over 1.7 million documents.
Basically, it went great. I'll definately use larger
'chunks' during the next round of .is harvesting.
The only encountered errors where 2 alerts like this one:
Problem java.lang.NullPointerException occured when
trying to process
'http://www.heildsoluapotek.is/ecweb/upplysingar/?cat_id=18621&ew_2_r_f=7&e
w_2_r_t=12&news_category_id='
at step ABOUT_TO_BEGIN_PROCESSOR
Associated Throwable: java.lang.NullPointerException
Stacktrace:
java.lang.NullPointerException
at
org.archive.io.RecordingOutputStream.record(RecordingOutputStream.java(Comp
iled
Code))
at
org.archive.io.RecordingOutputStream.write(RecordingOutputStream.java(Inlin
ed
Compiled Code))
at
org.archive.io.RecordingInputStream.read(RecordingInputStream.java(Compiled
Code))
at
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java(Compil
ed
Code))
at
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java(Compiled
Code))
at
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java(C
ompiled
Code))
at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnec
tionAdapter.readLine(MultiThreadedHttpConnectionManager.java(Compiled
Code))
at
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.
java(Compiled
Code))
at
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.ja
va(Compiled
Code))
at
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java(In
lined
Compiled Code))
at
org.archive.httpclient.HttpRecorderGetMethod.execute(HttpRecorderGetMethod.
java(Compiled
Code))
at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMetho
dDirector.java(Compiled
Code))
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDi
rector.java(Compiled
Code))
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java(Comp
iled
Code))
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java(Comp
iled
Code))
at
org.archive.crawler.fetcher.FetchHTTP.innerProcess(FetchHTTP.java(Compiled
Code))
at
org.archive.crawler.framework.Processor.process(Processor.java(Compiled
Code))
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java(Comp
iled
Code))
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java(Compiled
Code))
Both occured fairly early on in the crawl (nothing like
this happened during the 1000 domain test crawl).
Non-fatal, but new to me.
Nobody/Anonymous
None
None
Public
|
Date: 2007-03-14 00:20
|
|
Date: 2005-02-09 19:41 Logged In: YES |
| Field | Old Value | Date | By |
|---|---|---|---|
| close_date | - | 2005-02-09 19:41 | stack-sf |
| status_id | Open | 2005-02-09 19:41 | stack-sf |
| resolution_id | None | 2005-02-09 19:41 | stack-sf |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use