Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

6 FetchDNS doesn't work (bug in dnsjava) - ID: 1116204
Last Update: Comment added ( karl-ia )

In Windows 2K, if the ip<->host is mappend into the
".../system32/drivers/etc/hosts", FetchDNS doesn't
work. The problem is in the implementation of dnsjava
1.6.2 (and 1.6.4).
The problem is in the line:

// try to get the records for this host (assume domain
name)
// TODO: Bug #935119 concerns potential hang here
rrecordSet = dns.getRecords(dnsName, TypeType, ClassType);

the method return "null".
But the standard method (using java.net.InetAddress):

InetAddress address = InetAddress.getByName(dnsName);

work well.




e-mail: corrado.mio@tin.it


Nobody/Anonymous ( nobody ) - 2005-02-04 14:21

6

Closed

Fixed

Nobody/Anonymous

Protocols

None

Public


Comments ( 4 )

Date: 2007-03-14 00:20
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-351 -- please add further
comments at that location.


Date: 2005-02-11 22:35
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Name->IP mappings in a 'hosts' file aren't technically DNS
lookups, so it's not surprising that DNSJAVA doesn't find them.

For our crawls at the Archive, we only want to consult
public DNS, not any private settings.

However, it's reasonable to be able to fall back on local
settings, esp. if/when crawling an intranet.

So I've added an expert setting to FetchDNS:
'accept-non-dns-resolves'. Default is false -- whcih gets
the same behavior as before, and reported in this bug.
Setting it to true will try InetAddress.getByName() after a
DNS lookup fails, and if the getByName succeeds, use the
resulting IP address, and set the URI's fetch status to the
new code S_GETBYNAME_SUCCESS (1001).

This will allow crawling of the target host to proceed,
though no record of the name->IP binding will be present in
the ARC files.

Commit comment:
Fix for [ 1116204 ] FetchDNS doesn't work (bug in dnsjava)
* FetchStatusCodes.java
Add new code S_GETBYNAME_SUCCESS (1001) - for non-DNS
successful IP lookups
* FetchDNS.java
Add new 'accept-non-dns-resolves' setting. If true,
fallback to InetAddress.getByName()
* ARCWriterProcessor.java
Only try to write text/dns record on true DNS success


Date: 2005-02-04 18:16
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Upping priority.


Date: 2005-02-04 17:05
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

What would you suggest? That when we get back a null, that
we go to InetAddress.getByName?

Maybe I should try this on linux adding an address to
/etc/hosts to see if I can reproduce your windows behavior?


Attached File

No Files Currently Attached

Changes ( 4 )

Field Old Value Date By
resolution_id None 2005-02-11 22:35 gojomo
close_date - 2005-02-11 22:35 gojomo
status_id Open 2005-02-11 22:35 gojomo
priority 5 2005-02-04 18:16 stack-sf