Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

7 DNS records in ARCs should use DNS server IP - ID: 1157085
Last Update: Comment added ( karl-ia )

Looking at

[ 1036720 ] NPE in ArcWriterProcessor.writeDns()

It occurs to me that it would make more sense to place
the IP address of the DNS server that gave us the
response, rather than the resolved IP address of the
target domain, into the ARC record for DNS URIs.

The record itself actually came from the DNS server IP
address -- not the resolved IP address. (Indeed, at the
time of the DNS record being written, we may not yet
have contacted the resolved IP at all.)

Though what should be in a DNS ARC record has never
been precisely defined, the idea that the IP should
reflect the actual IP providing the info seems more in
spirit with the HTTP behavior.


Gordon Mohr ( gojomo ) - 2005-03-05 02:18

7

Closed

Fixed

Michael Stack

None

None

Public


Comments ( 6 )

Date: 2007-03-14 00:21
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-373 -- please add further
comments at that location.


Date: 2005-03-11 00:33
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Here was the commit:

Update of
/cvsroot/archive-crawler/ArchiveOpenCrawler/src/java/org/archive/crawler/fetcher
In directory
sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17795/src/java/org/archive/crawler/fetcher

Modified Files:
FetchDNS.java
Log Message:
Change ip in dns records (I don't have issue #. Sourceforge
is down).
* src/articles/releasenotes.xml
Noted change in dns arc records.
*
src/java/org/archive/crawler/datamodel/CoreAttributeConstants.java
A_DNS_SERVER_IP_LABEL: added.
* src/java/org/archive/crawler/fetcher/FetchDNS.java
Formatting. Put the dns server IP into the curi#alist.
(getFirstARecord): Added.
* src/java/org/archive/crawler/framework/CrawlController.java
Unrelated change. Added commented out lines, lines to
enable evictor
logging.
* src/java/org/archive/crawler/writer/ARCWriterProcessor.java
If a dns record, use the dnsserver ip rather than that
of the host.


Date: 2005-03-11 00:32
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Fixed. Closing.


Date: 2005-03-09 17:46
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

It should be the nameserver (dns.duboce.net in your
example), because (1) in a manner analogous to the HTTP
case, it is the remote machine from which the logged
information was received; (2) it's currently not available
anywhere else (the authority is in the record).

DNS poisoning is becoming a bigger issue:
http://news.com.com/Phishers+using+DNS+servers+to+lure+victims/2100-7349_3-5604555.html



Date: 2005-03-09 17:10
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Assigning to Gordon for clarification (Give it back to me
when done).

You're thinking we'd write the IP of the authority dns
server, that of the server that comes first in the list of
NS authorities, rather than write the IP of the resolver we
went to to go get the record. E.g. Below is me doing lookup
on archive.org from home:

bigmac:~/workspace/heritrix stack$ dig archive.org

; <<>> DiG 9.2.2 <<>> archive.org
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21619
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 3,
ADDITIONAL: 3

;; QUESTION SECTION:
;archive.org. IN A

;; ANSWER SECTION:
archive.org. 1585 IN A 207.241.228.105

;; AUTHORITY SECTION:
archive.org. 72472 IN NS ns2.archive.org.
archive.org. 72472 IN NS
ns1.eu.archive.org.
archive.org. 72472 IN NS ns1.archive.org.

;; ADDITIONAL SECTION:
ns2.archive.org. 12512 IN A 207.241.238.254
ns1.eu.archive.org. 12512 IN A 194.109.159.41
ns1.archive.org. 12512 IN A 207.241.224.253

;; Query time: 1 msec
;; SERVER: 63.203.238.114#53(63.203.238.114)
;; WHEN: Wed Mar 9 09:03:52 2005
;; MSG SIZE rcvd: 150

You're thinking we're write the IP of ns2.archive.org., not
the IP of dns.duboce.net, the server that returned the above
record?

Thanks.


Date: 2005-03-05 02:42
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Assigning myself.
Also release note the change.


Attached File

No Files Currently Attached

Changes ( 6 )

Field Old Value Date By
resolution_id None 2005-03-11 00:32 stack-sf
close_date - 2005-03-11 00:32 stack-sf
status_id Open 2005-03-11 00:32 stack-sf
assigned_to gojomo 2005-03-09 17:46 gojomo
assigned_to stack-sf 2005-03-09 17:10 stack-sf
assigned_to nobody 2005-03-05 02:42 stack-sf