Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

7 'treat seed redirects as new seeds' not working - ID: 1219262
Last Update: Comment added ( karl-ia )

Hasn't been working in either new DecidingScope, and
has stopped working in classic scopes.


Gordon Mohr ( gojomo ) - 2005-06-12 18:03

7

Closed

None

Karl Thiessen

None

1.6.0

Public


Comments ( 3 )

Date: 2007-03-14 00:54
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-437 -- please add further
comments at that location.


Date: 2005-08-04 20:39
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Was fixed by June commit. Assigning to Karl for
verification/closing.


Date: 2005-06-12 19:53
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Fixed problem in CandidateURI where via, an important
indicator that a seed was auto-promoted after a redirect,
was being clobbered.

Added facility in CrawlScope to notify listeners when seeds
are added, so that down-wind parts of composite scopes (like
DecidingScope) have a chance to react to mid-crawl seed
additions, too.

Commit comment:

Fix for [ 1219262 ] 'treat seed redirects as new seeds' not
working
* CandidateURI.java
don't clobber 'via' when setting as-seed
* CrawlScope.java, SeedListener.java
add facility for other components to register an
interest in seed additions
change addSeed to take a CrawlURI, so that via
information is available
* AbstractFrontier.java, AdaptiveRevisitFrontier.java,
HostQueuesFrontier.java, SeedCachingScope.java,
SeedCachingScopeTest.java
use new addSeed(CrawlURI)
* SurtPrefixedDecideRule.java
when deriving from seeds, also register as listener, and
note adds, so that redirects-become-seeds work for
DecidingScopes with Surt (and Surt-derived) decide rules


Attached File

No Files Currently Attached

Changes ( 4 )

Field Old Value Date By
status_id Open 2005-12-02 17:14 stack-sf
close_date - 2005-12-02 17:14 stack-sf
artifact_group_id None 2005-09-23 18:29 gojomo
assigned_to gojomo 2005-08-04 20:39 gojomo