Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

7 UURI.length() > 2k - ID: 1012520
Last Update: Comment added ( karl-ia )

We lost this code refactoring UURI:

public class UURI implements Serializable {
// for now, consider URIs too long for IE as illegal
// TODO: move this policy elsewhere
private static int DEFAULT_MAX_URI_LENGTH = 2083;

private static Logger logger =

Logger.getLogger("org.archive.crawler.datamodel.UURI");

protected java.net.URI uri;
protected String uriString;

public static UURI createUURI(String s) throws
URISyntaxException {
return new UURI(normalize(s));
}

/**
* @param u
*/
private UURI(URI u) throws URISyntaxException {
uri = u;
try {
uriString = u.toASCIIString();
} catch (NullPointerException npe) {
throw new
URISyntaxException(u.toString(),"URI.encode NPE");
}
if (uriString.length()>DEFAULT_MAX_URI_LENGTH) {
throw new URISyntaxException(uriString,"Too
Long");
}
}


Put it back.


Michael Stack ( stack-sf ) - 2004-08-19 21:21

7

Closed

Fixed

Michael Stack

uri

1.0.1

Public


Comments ( 2 )

Date: 2007-03-14 00:15
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-235 -- please add further
comments at that location.


Date: 2004-08-25 19:30
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Fixed. Below is commit message and core of patch.

Fix for "[ 1012520 ] UURI.length() > 2k"
* src/java/org/archive/crawler/datamodel/UURIFactory.java
(fixup): Added tests for long URL.
* src/java/org/archive/crawler/datamodel/UURIFactoryTest.java
(test2kURI): Added unit test to test long URIs throw
exception.


Index: src/java/org/archive/crawler/datamodel/UURIFactory.java
===================================================================
RCS file:
/cvsroot/archive-crawler/ArchiveOpenCrawler/src/java/org/archive/crawl
er/datamodel/UURIFactory.java,v
retrieving revision 1.4
diff -u -r1.4 UURIFactory.java
--- src/java/org/archive/crawler/datamodel/UURIFactory.java
6 Aug 2004 21:20
:57 -0000 1.4
+++ src/java/org/archive/crawler/datamodel/UURIFactory.java
25 Aug 2004 19:2
6:27 -0000
@@ -191,7 +191,12 @@
*/
final static Pattern HTTP_SCHEME_SLASHES =
Pattern.compile("^(https?://)/+(.*)");
-
+
+ /**
+ * Consider URIs too long for IE as illegal.
+ */
+ private final static int MAX_URL_LENGTH = 2083;
+
/**
* Protected constructor.
*/
@@ -310,6 +315,12 @@
} else if (uri.length() == 0 && base == null){
throw new URIException("URI length is zero (and
not relative).");
}
+
+ if (uri.length() > MAX_URL_LENGTH) {
+ // TODO: Would make sense to test against for
excessive length
+ // after all the fixup and normalization has
been done.
+ throw new URIException("URI length > " +
MAX_URL_LENGTH);
+ }

// Replace nbsp with normal spaces (so that they
get stripped if at
// ends, or encoded if in middle)


Attached File

No Files Currently Attached

Changes ( 5 )

Field Old Value Date By
status_id Open 2004-08-25 19:30 stack-sf
resolution_id None 2004-08-25 19:30 stack-sf
close_date - 2004-08-25 19:30 stack-sf
priority 5 2004-08-23 23:52 gojomo
artifact_group_id None 2004-08-19 21:22 stack-sf