Share

Heritrix: Internet Archive Web Crawler

Tracker: Feature Requests

5 add CandidateURI parameter to UriUniqFilter.forget() - ID: 1205583
Last Update: Comment added ( karl-ia )

Hi!

I am currently implementing new UriUniqFilters and have
come to the point where I would like to have an API change :-)

Is there a specific reason why it is necessary that
UriUniqFilter's forget() method only takes a key String,
whereas its add...() methods take a CandidateURI value in
addition?

Basically the key String and the CandidateURI value are only
two different representations of the same thing there.
However, the latter provides itemized access an URI's
components (like scheme, host etc.), which could be useful
in deciding "whether and how to forget" an URI.

As far as I can see, a matching CandidateURI (or CrawlURI)
object is available at every code point where forget() is called,
so the change should not be too complicated.

Christian


Christian Kohlschütter ( ck-heritrix ) - 2005-05-20 11:44

5

Closed

None

Michael Stack

API

1.6.0

Public


Comments ( 2 )

Date: 2007-03-14 01:41
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-931 -- please add further
comments at that location.


Date: 2005-05-20 18:10
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Done. Closing. Below is commit.


Implementing '[ 1205583 ] add CandidateURI parameter to
UriUniqFilter.forget()'
for Chrisitan.

* src/java/org/archive/crawler/datamodel/UriUniqFilter.java
Javadoc fixes.
(forget): Changed API. Pass in along w/ key the
candidate uri.
Passing in the CandidateURI adds convenience (Usually,
key is
an url (canonicalized). CURI contains parsed key in this
case w/ utility
methods that give access to subparts). Adding
candidateuri also makes this
method match form of other methods in this interface.
* src/java/org/archive/crawler/frontier/HostQueuesFrontier.java
* src/java/org/archive/crawler/frontier/WorkQueueFrontier.java
* src/java/org/archive/crawler/util/BdbUriUniqFilter.java
* src/java/org/archive/crawler/util/BdbUriUniqFilterTest.java
* src/java/org/archive/crawler/util/FPUriUniqFilter.java
* src/java/org/archive/crawler/util/FPUriUniqFilterTest.java
* src/java/org/archive/crawler/util/MemUriUniqFilter.java
(forget): Added curi param to match changed interface.



Attached File

No Files Currently Attached

Changes ( 5 )

Field Old Value Date By
artifact_group_id None 2005-09-23 21:08 gojomo
status_id Open 2005-05-20 18:10 stack-sf
category_id None 2005-05-20 18:10 stack-sf
assigned_to nobody 2005-05-20 18:10 stack-sf
close_date - 2005-05-20 18:10 stack-sf