Share

Heritrix: Internet Archive Web Crawler

Tracker: Feature Requests

6 HTTPClient should use supplied IP / avoid DNS lookup - ID: 902970
Last Update: Comment added ( karl-ia )

We want to control DNS lookups ourselves, so that they
can be logged, directed to alternate nameservers if
necessary, and perhaps cached according to our needs
rather than the JRE defaults.

As a result, we only ever attempt an HTTP fetch if we
already have valid hostname/ip info. However,
HTTPClient does not use our info, but rather repeats
the lookup itself using default Java facilities.

We would prefer to supply it with an IP address we
already know to be associated with the given URI's
hostname, and have it use that IP rather than attempt a
redundant lookup.

If we can figure out a clean way to add this as an
option to HTTPClient, we should implement it and donate
the patch to the HTTPClient project.


Gordon Mohr ( gojomo ) - 2004-02-23 20:49

6

Closed

None

Michael Stack

None

None

Public


Comments ( 4 )

Date: 2007-03-14 01:26
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-721 -- please add further
comments at that location.


Date: 2004-10-09 01:32
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Fixed. Below is commit message. Closing.

Fix for [ 902970 ] HTTPClient should use supplied IP / avoid
DNS lookup
Looking at net traffic with ethereal, I see two DNS lookups
for every host.
After this patch, only one lookup per host is being done.
* src/java/org/archive/crawler/fetcher/FetchDNS.java
Line lengths and note to refactor to use new
DNSJavaUtils class.
* src/java/org/archive/crawler/fetcher/FetchHTTP.java
Register our new DNSJavaProtocolSocketFactory for http
transactions.
*
src/java/org/archive/httpclient/ConfigurableTrustManagerProtocolSocketFactory.java
Use new DNSJavaUtils class to get InetAddress for remote
hosts.



Date: 2004-10-09 01:32
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Fixed. Below is commit message. Closing.

Fix for [ 902970 ] HTTPClient should use supplied IP / avoid
DNS lookup
Looking at net traffic with ethereal, I see two DNS lookups
for every host.
After this patch, only one lookup per host is being done.
* src/java/org/archive/crawler/fetcher/FetchDNS.java
Line lengths and note to refactor to use new
DNSJavaUtils class.
* src/java/org/archive/crawler/fetcher/FetchHTTP.java
Register our new DNSJavaProtocolSocketFactory for http
transactions.
*
src/java/org/archive/httpclient/ConfigurableTrustManagerProtocolSocketFactory.java
Use new DNSJavaUtils class to get InetAddress for remote
hosts.



Date: 2004-10-09 01:29
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Fixed. Below is commit message. Closing.

Fix for [ 902970 ] HTTPClient should use supplied IP / avoid
DNS lookup
Looking at net traffic with ethereal, I see two DNS lookups
for every host.
After this patch, only one lookup per host is being done.
* src/java/org/archive/crawler/fetcher/FetchDNS.java
Line lengths and note to refactor to use new
DNSJavaUtils class.
* src/java/org/archive/crawler/fetcher/FetchHTTP.java
Register our new DNSJavaProtocolSocketFactory for http
transactions.
*
src/java/org/archive/httpclient/ConfigurableTrustManagerProtocolSocketFactory.java
Use new DNSJavaUtils class to get InetAddress for remote
hosts.



Attached File

No Files Currently Attached

Changes ( 4 )

Field Old Value Date By
status_id Open 2004-10-09 01:29 stack-sf
assigned_to nobody 2004-10-09 01:29 stack-sf
close_date - 2004-10-09 01:29 stack-sf
priority 5 2004-09-01 21:59 stack-sf