Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

9 Seed to SURT coversion issues - ID: 1062604
Last Update: Comment added ( karl-ia )

Dan found following:

http://timmknibbs4senate.blogspot.com/

is converted to the SURT

http://(com,blogspot,timmknibbs4senate/

and

http://www.electionprotectionvolunteer.org/electionprotection/

to

http://(org,electionprotectionvolunteer,www/electionprotection/

Also, https seeds are not converted to http as seems to
be the pattern.

Included is a patch to address both of above issues.
Assigning to Gordon to review for commit to HEAD so Dan
can do LoC crawls with 1.2 and SURTs.


Michael Stack ( stack-sf ) - 2004-11-08 18:48

9

Closed

Fixed

Michael Stack

configuration

None

Public


Comments ( 3 )

Date: 2007-03-14 00:18
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-287 -- please add further
comments at that location.


Date: 2004-11-09 00:53
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Committed patch. Closing.


Date: 2004-11-08 20:35
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

+1
Good test cases, code addresses problem, can't think of any
problem cases for new code.


Attached File ( 1 )

Filename Description Download
diff.patch Patch for SURTset Download

Changes ( 6 )

Field Old Value Date By
status_id Open 2004-11-09 00:53 stack-sf
resolution_id None 2004-11-09 00:53 stack-sf
close_date - 2004-11-09 00:53 stack-sf
priority 5 2004-11-08 20:35 gojomo
assigned_to gojomo 2004-11-08 20:35 gojomo
File Added 108054: diff.patch 2004-11-08 18:48 stack-sf