Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

7 no rule for decidingscope to always crawl seeds - ID: 1219486
Last Update: Comment added ( karl-ia )

It's possible for a 'dynamic seed' (redirect result) to
be ruled out of scope by a deciding rule chain, as the
new seed itself hasn't been added yet to (for example)
the set of acceptable URL spaces (which happens when a
trigger is hit as it's being scheduled, not when it's
being considered for scoping).

Probably, one of the standard rules should just include
a 'all seeds are always in scope' provision. For
example, the PrerequisiteAcceptDecideRule could expand
to being a PrimaFacieAcceptDecideRule, for prereqs and
seeds.


Gordon Mohr ( gojomo ) - 2005-06-13 05:50

7

Closed

Fixed

Gordon Mohr

None

1.6.0

Public


Comments ( 2 )

Date: 2007-03-14 00:54
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-438 -- please add further
comments at that location.


Date: 2005-09-14 00:18
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Considered making a PrimaFacieAcceptDecideRule (or
AutomaticAcceptDecideRule) for both prereqs and seeds -- but
then considered that in some cases, both may not be wanted.
So decided to make a SeedAcceptDecideRule instead -- people
who want both can include both. Commit comment:

Fix for [ 1219486 ] no rule for decidingscope to always
crawl seeds
* SeedAcceptDecideRule.java
decide rule that always accepts URIs whose isSeed flag
is true
* DecideRule.options
add SeedAcceptDecideRule to picklist

Closing.


Attached File

No Files Currently Attached

Changes ( 4 )

Field Old Value Date By
artifact_group_id None 2005-09-23 18:01 gojomo
status_id Open 2005-09-14 00:18 gojomo
resolution_id None 2005-09-14 00:18 gojomo
close_date - 2005-09-14 00:18 gojomo