A UKGov shadow crawl (actually just crawling junk at
the time) appears to have generated no output to the
uri-processing log from 5:20pm last night (2003-09-29)
to 6:52pm.
The two log entries which bookend the problem period:
20030929172052055 200 - #15
http://www.sparklit.com/agreements.spark?sparkKey=be60210e1e10292cf34fb9948
9d72a13b0
text/html
20030929185221893 200 - #21
http://www.sparklit.com/affiliate.spark?sparkKey=1bfd1195e938561ee000efe261
60069eb0
text/html
This correlates with a period in which some aspects of
the IA network -- perhaps just DNS -- were unavailable.
However, there's no (designed in) facility for Heritrix
to just pause like that. If for any reason HTTP fetch
attempts are failing, the URI should be rescheduled for
a future retry. Eventually -- on the order of minutes,
not hours -- the retry limit should be exceeded and the
URI should be logged as a too-many-retries error.
The discrepancy between expected behavior and actual
should be investigated and possibly corrected.
Nobody/Anonymous
General
None
Public
|
Date: 2007-03-14 00:06
|
|
Date: 2006-04-19 14:01 Logged In: NO |
|
Date: 2006-04-19 14:01 Logged In: NO |
|
Date: 2006-04-19 14:01 Logged In: NO |
|
Date: 2004-02-17 19:55 Logged In: YES |
| Field | Old Value | Date | By |
|---|---|---|---|
| status_id | Open | 2004-02-17 19:55 | gojomo |
| resolution_id | None | 2004-02-17 19:55 | gojomo |
| close_date | - | 2004-02-17 19:55 | gojomo |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use