After only 20-30 second of running a crawl, URLs with
-2 error code and 10 tries (10t) show up in crawl.log.
Some crawl configuration used:
max-tries = 10;
retry-delay-seconds = 900
timeout-seconds = 1200
sotimeout-ms = 20000
First line of the crawl.log:
20040303013639842 1 55 #1 dns:www.cecc.gov
2132 text/dns
P http://www.cecc.gov/
.
.
.
About 20s later:
20040303013658845 -2 . #4
http://www.csce.gov/images/map-lft.gif . . 10t
E http://www.csce.gov/helsinki.cfm
20040303013658849 -2 . #3
http://www.csce.gov/images/text-search.gif . . 10t
E http://www.csce.gov/helsinki.cfm
20040303013658853 -2 . #2
http://www.csce.gov/images/menu-privacy.gif . . 10t
E http://www.csce.gov/helsinki.cfm
.
.
.
.
From local-errors.log (this images is retried 10 time
in just 3 seconds) :
First try:
20040303013649559 -2 . #4
http://www.csce.gov/images/map-lft.gif . .
E http://www.csce.gov/helsinki.cfm
java.net.SocketException: Socket is closed
at java.net.Socket.setSoTimeout(Socket.java:918)
at
org.apache.commons.httpclient.HttpConnection.setSoTimeout(HttpConnection.ja
va:623)
at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnec
tionAdapter.setSoTimeout(MultiThreadedHttpConnectionManager.java:1174)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:658)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:529)
at
org.archive.crawler.fetcher.FetchHTTP.innerProcess(FetchHTTP.java:178)
at
org.archive.crawler.framework.Processor.process(Processor.java(Compiled
Code))
at
org.archive.crawler.framework.ToeThread.processingLoop(ToeThread.java(Compi
led
Code))
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java:100)
Last try (same error) at timestamp:
20040303013653835
Nobody/Anonymous ( nobody ) - 2004-03-03 02:00
9
Closed
Fixed
Gordon Mohr
General
None
Public
|
Date: 2007-03-14 00:08
|
|
Date: 2004-03-29 23:20 Logged In: YES |
|
Date: 2004-03-04 00:16 Logged In: YES |
| Field | Old Value | Date | By |
|---|---|---|---|
| status_id | Open | 2004-03-29 23:20 | gojomo |
| resolution_id | None | 2004-03-29 23:20 | gojomo |
| close_date | - | 2004-03-29 23:20 | gojomo |
| priority | 5 | 2004-03-03 02:05 | ia_igor |
| assigned_to | nobody | 2004-03-03 02:05 | ia_igor |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use