Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

5 DNS URIs don't get override settings - ID: 1045016
Last Update: Comment added ( karl-ia )

See also related bug:
[ 1008990 ] dns uuri.getHost() returns null (getPath
has host)

Since DNS UURIs don't give a host that can be used to
navigate settings hierarchy, a DNS URI (eg
dns:www.archive.org) won't get any override settings
relevant to its subject host applied to it. This can
interfere with proper operation if using the
'force-queue' override to place all URIs of a certain
domain into a single named queue. (The DNS URI needs to
go the same place as the HTTPs, to preserver
prerequisities-first ordering.)


Gordon Mohr ( gojomo ) - 2004-10-12 02:09

5

Closed

Fixed

Gordon Mohr

configuration

None

Public


Comments ( 2 )

Date: 2007-03-14 00:16
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-257 -- please add further
comments at that location.


Date: 2004-10-12 02:13
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

As with Stack's comments on [ 1008990 ], I'm reluctant to
change getHost() to also return DNS URI hosts, because by
URI format specs, the getHost() has a precise meaning in
common-form/authority-based URIs. So I made an alternate
method for the expanded meaning, that's useful to the
settings system.

Commit comment:

Fix for [ 1045016 ] DNS URIs don't get override settings
* UURI.java
Add a getReferencedHost() method, which gets typical
host plus host as appears in DNS URIs. (DNS host not added
to regular getHost() as that method specifically refers to
hosts of the typical form.)
* CrawlServer.java, ComplexType.java
Replace uses of getHost() that determine applicable
settings with getReferencedHost()



Attached File

No Files Currently Attached

Changes ( 3 )

Field Old Value Date By
status_id Open 2004-10-14 23:15 gojomo
close_date - 2004-10-14 23:15 gojomo
resolution_id None 2004-10-12 02:13 gojomo