Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

5 [UURI] escaped absolute path not valid - ID: 1000338
Last Update: Comment added ( karl-ia )

This page has relative escaped paths. They don't seem
to be making it past our UURI parser. Check it out.

20 20040729040845749
http://www.army.mod.uk/sportandadventure/clubs/alta/index.htm
"escaped absolute path not valid"
/linked_files/sport/alta/2004%20ENTRY%20FORM%20FOR%20ARMY%20CHAMPS[1].doc
21 20040729040845757
http://www.army.mod.uk/sportandadventure/clubs/alta/index.htm
"escaped absolute path not valid"
/linked_files/sport/alta/rlc%20Tennis%20letter%2004[1].doc
22 20040729040845759
http://www.army.mod.uk/sportandadventure/clubs/alta/index.htm
"escaped absolute path not valid"
/linked_files/sport/alta/ALTA-Army%20Cup%202004-Rules[1].doc
23 20040729040845764
http://www.army.mod.uk/sportandadventure/clubs/alta/index.htm
"escaped absolute path not valid"
/linked_files/sport/alta/2004%20DCI-ALTA%20CHAMPS[1].doc


Michael Stack ( stack-sf ) - 2004-07-29 21:00

5

Closed

Out of Date

Nobody/Anonymous

Extraction

1.6.0

Public


Comments ( 2 )

Date: 2007-03-14 00:14
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-205 -- please add further
comments at that location.


Date: 2005-09-23 02:36
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Report is over a year old, and original page is no longer
available for testing.

However, I suspect the problem was that the %20 made our old
URI-fixup code assume the URI was already encoded -- but it
still included characters -- [ and ] -- that need encoding
to live in a URI.

Testing a similar URI as a seed in current code reveals:
- we encode the brackets before attempting the URI (Firefox
does the same thing)
- there's no error and the URI is attempted fine

So I believe other changes have fixed this. Closing as
out-of-date.


Attached File

No Files Currently Attached

Changes ( 4 )

Field Old Value Date By
artifact_group_id None 2005-09-23 18:01 gojomo
status_id Open 2005-09-23 02:36 gojomo
resolution_id None 2005-09-23 02:36 gojomo
close_date - 2005-09-23 02:36 gojomo