Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

5 CandidateURI serialization 'decodes' UURI - ID: 1226707
Last Update: Comment added ( karl-ia )

archive-crawler-lists got a complaint that we were
visiting not with the URLs that were in the page but
decoded versions: i.e. When an url had a query portion
that looked like '...&sch=%2E%2F%3Faction%3Dsearch...',
we were sending over '...&sch=.?/action%3Dsearch...'.

Here is the complaint:
http://sourceforge.net/mailarchive/forum.php?thread_id=7566295&forum_id=343
40

Igor verified by digging in ARCs that we were indeed
sending over decodings.


Michael Stack ( stack-sf ) - 2005-06-24 02:08

5

Closed

Fixed

Karl Thiessen

uri

1.6.0

Public


Comments ( 2 )

Date: 2007-03-14 00:56
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-453 -- please add further
comments at that location.


Date: 2005-06-24 02:17
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Below is commit. Set resolution to fixed and status as
'pending' Karl review. Probably doesn't need to be tested
Karl since added unit test (To 'break' it again, undo
following change in CandidateURI):

@@ -561,7 +561,7 @@
private void writeObject(ObjectOutputStream stream)
throws IOException {
stream.defaultWriteObject();
- stream.writeUTF(uuri.getURI());
+ stream.writeUTF(uuri.toString());
stream.writeObject((via == null) ? null :
via.getURI());
stream.writeObject((alist==null) ? null : alist);
}



Attached File

No Files Currently Attached

Changes ( 5 )

Field Old Value Date By
artifact_group_id None 2005-09-23 18:02 gojomo
status_id Open 2005-06-24 02:17 stack-sf
resolution_id None 2005-06-24 02:17 stack-sf
assigned_to stack-sf 2005-06-24 02:17 stack-sf
close_date - 2005-06-24 02:17 stack-sf