Share

Heritrix: Internet Archive Web Crawler

Tracker: Feature Requests

5 List based URIRegExprFilter - ID: 1208293
Last Update: Comment added ( karl-ia )

URI reg.expr filter (OR based) that has a list of
strings instead of just one.

I find myself using a series of URI reg.expr. filters
to tackle numerous issues. I was thinking that it might
be useful to add a similar filter that contained a list
of strings. This would make it a lot easier to add
additional filters while a crawl is in progress.

Gordon notes:
Good idea. For anywhere a Filter would be considered, a
DecideRule (or pair, for the 'matches' and
'not-matches' senses) makes sense too now.


Kristinn Sigurdsson ( kristinn_sig ) - 2005-05-25 09:04

5

Closed

None

Kristinn Sigurdsson

Configuration

1.6.0

Public


Comments ( 2 )

Date: 2007-03-14 01:41
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-937 -- please add further
comments at that location.


Date: 2005-05-30 14:43
Sender: kristinn_sigProject Admin

Logged In: YES
user_id=892643

Completed.

A new Filter + a pair of decide rules. Optional if the list
should be logically OR or AND. Default is OR as that is
probably much more common.

The filter has been tested, but the decide rules have only
had some minimal testing (the logic was simply copied and
should be fine).

The filter was added to Filter.options and the rules were
added to DecideRule.options.

Note that while the default behavior for an empty list is to
not match, the NotMatches decide rule reverses this. It is
arguable if this is proper.


Attached File

No Files Currently Attached

Changes ( 4 )

Field Old Value Date By
artifact_group_id None 2005-09-23 21:08 gojomo
close_date - 2005-05-30 14:43 kristinn_sig
status_id Open 2005-05-30 14:43 kristinn_sig
category_id None 2005-05-30 14:43 kristinn_sig