Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

5 Untried CrawlURIs should have clear status code - ID: 904767
Last Update: Comment added ( karl-ia )

We just disabled the HTTP Fetch module for certain
per-host overrides, expecting the affected URIs to land
in the logs with either untried (0) or too-many-retries
(after being untried multiple times in a row) statuses.
Instead, a few had connect-failure statuses... which
are only set by the HTTP Fetcher itself. Thus these
must have been left over from a previous attempt of the
same URI.

This is misleading, and might cause problems in other
cases -- as a URI continues to be treated as if
something on a previous attempt had happened again.


Gordon Mohr ( gojomo ) - 2004-02-26 02:53

5

Closed

Fixed

Gordon Mohr

None

0.8.0

Public


Comments ( 2 )

Date: 2007-03-14 00:08
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-83 -- please add further
comments at that location.


Date: 2004-04-15 22:23
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

CrawlURI.processingCleanup() now resets fetchStatus to 0
(UNATTEMPTED) -- see also Bug #896764 regarding unintended
FTP retries. Now an untried CrawlURI will have the
UNATTEMPTED status.


Attached File

No Files Currently Attached

Changes ( 5 )

Field Old Value Date By
status_id Open 2004-04-15 22:23 gojomo
resolution_id None 2004-04-15 22:23 gojomo
close_date - 2004-04-15 22:23 gojomo
artifact_group_id None 2004-03-31 01:12 gojomo
assigned_to nobody 2004-03-30 23:30 gojomo