Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

5 mysterious pause facing network (DNS) problem - ID: 815357
Last Update: Comment added ( karl-ia )

A UKGov shadow crawl (actually just crawling junk at
the time) appears to have generated no output to the
uri-processing log from 5:20pm last night (2003-09-29)
to 6:52pm.

The two log entries which bookend the problem period:

20030929172052055 200 - #15
http://www.sparklit.com/agreements.spark?sparkKey=be60210e1e10292cf34fb9948
9d72a13b0
text/html
20030929185221893 200 - #21
http://www.sparklit.com/affiliate.spark?sparkKey=1bfd1195e938561ee000efe261
60069eb0
text/html

This correlates with a period in which some aspects of
the IA network -- perhaps just DNS -- were unavailable.

However, there's no (designed in) facility for Heritrix
to just pause like that. If for any reason HTTP fetch
attempts are failing, the URI should be rescheduled for
a future retry. Eventually -- on the order of minutes,
not hours -- the retry limit should be exceeded and the
URI should be logged as a too-many-retries error.

The discrepancy between expected behavior and actual
should be investigated and possibly corrected.


Gordon Mohr ( gojomo ) - 2003-09-30 19:28

5

Closed

Invalid

Nobody/Anonymous

General

None

Public


Comments ( 5 )

Date: 2007-03-14 00:06
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-35 -- please add further
comments at that location.


Date: 2006-04-19 14:01
Sender: nobody

Logged In: NO

online directory main


Date: 2006-04-19 14:01
Sender: nobody

Logged In: NO

hello! http://www.areaseo.com/contacts/ google pr. SE marketing, High
Rankings, SEO consultant
. From google pr .


Date: 2006-04-19 14:01
Sender: nobody

Logged In: NO

Welcome!!! http://www.areaseo.com/contacts/ google pr.
[URL=http://www.areaseo.com]pagerank 5[/URL]: SE marketing, High Rankings,
SEO consultant
. Also [url=http://www.areaseo.com]online pr16[/url] from
google pr .


Date: 2004-02-17 19:55
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Actually, a retry delay of 10-15 minutes, and a retry-max of
10 or more, could explain such behavior. Closing as invalid
until further mysteries recur.


Attached File

No Files Currently Attached

Changes ( 3 )

Field Old Value Date By
status_id Open 2004-02-17 19:55 gojomo
resolution_id None 2004-02-17 19:55 gojomo
close_date - 2004-02-17 19:55 gojomo