Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

6 Frontier.next() forceFetches will cause assertion error - ID: 900826
Last Update: Comment added ( karl-ia )

In org.archive.crawler.basic.Frontier.next(), a
CandidateURI with a true forceFetch() will be sent to
emitCuri even if another URI of the same class is in
progress.

However, emitCuri() calls noteInProgress() which
asserts that no other URI of the same class is in progress.

Rather than an out-of-band forceFetch, we need a
facility that guarantees the "forced" URI is the next
one to be fetched, within the normal constraints.

I think this could be accomplised by adding a stack to
each class object... then each class would have both a
queue and a stack -- adding to the queue means
"eventually" and pushing to the stack means "before
everything else"... the stack would be exhausted before
the queue is considered.

This facility could be useful for fetching related
(embedded) items soon after where they originate, as
well -- but more investigation is necessary.


Gordon Mohr ( gojomo ) - 2004-02-20 02:38

6

Closed

Fixed

Gordon Mohr

None

None

Public


Comments ( 2 )

Date: 2007-03-14 00:08
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-75 -- please add further
comments at that location.


Date: 2004-03-25 22:14
Sender: kristinn_sigProject Admin

Logged In: YES
user_id=892643

Gordon implemented the change mentioned above. It's now been
tested and confirmed that it works.


Attached File

No Files Currently Attached

Changes ( 4 )

Field Old Value Date By
status_id Open 2004-03-25 22:14 kristinn_sig
resolution_id None 2004-03-25 22:14 kristinn_sig
assigned_to nobody 2004-03-25 22:14 kristinn_sig
close_date - 2004-03-25 22:14 kristinn_sig