The parent URI class for UURI can figure that the below
URI is improperly encoded. Our UURI fixup code is
letting the URI through because its judging it already
encoded (escaped). There must be a test from the
parent class that can be used to look at the escaping
that we should use so we can fail this URI before it
gets to PathDepthFilter and httpclient.
07/31/2004 15:31:00 -0700 SEVERE
org.archive.crawler.filter.PathDepthFilter innerAccepts
Failed getpath for
http://club-scar.com/zboard4/B.Kaga%3A%B%E8%C1%BE%BC%F6
07/31/2004 15:31:05 -0700 SEVERE
org.archive.crawler.filter.PathDepthFilter innerAccepts
Failed getpath for
http://club-scar.com/zboard4/B.Kaga%3A%B%E8%C1%BE%BC%F6
07/31/2004 15:31:05 -0700 SEVERE
org.archive.crawler.prefetch.PreconditionEnforcer
considerRobotsPreconditions Failed get of path for
CrawlURI(http://club-scar.com/zboard4/B.Kaga%3A%B%E8%C1%BE%BC%F6)
java.lang.IllegalArgumentException: Invalid uri
'http://club-scar.com/zboard4/B.Kaga%3A%B%E8%C1%BE%BC%F6':
incomplete trailing escape pattern
at
org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java(Com
piled
Code))
at
org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java(Inlin
ed
Compiled Code))
at
org.archive.httpclient.HttpRecorderGetMethod.<init>(HttpRecorderGetMethod.j
ava(Inlined
Compiled Code))
at
org.archive.crawler.fetcher.FetchHTTP.innerProcess(FetchHTTP.java(Compiled
Code))
at
org.archive.crawler.framework.Processor.process(Processor.java(Compiled
Code))
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java(Comp
iled
Code))
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java(Compiled
Code))
Michael Stack
Extraction
None
Public
|
Date: 2007-03-14 00:15
|
|
Date: 2004-10-13 23:15 Logged In: YES |
|
Date: 2004-08-05 18:19 Logged In: YES |
| Field | Old Value | Date | By |
|---|---|---|---|
| status_id | Open | 2004-10-13 23:15 | stack-sf |
| resolution_id | None | 2004-10-13 23:15 | stack-sf |
| close_date | - | 2004-10-13 23:15 | stack-sf |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use