Currently it is highly likely, but not guaranteed, that
the IP that was found from the logged DNS operation is
used for the following HTTP operations. Similarly, it
is highly likely, but not guaranteed, that the IP
written in an HTTP ARC record is the IP that was
contacted for the content.
Consider the scenario:
(1) DNS lookup is triggered, logged to ARC, noted in
CrawlHost instance.
(2) Later, HTTP fetch is attempted. Change for bug [
902970 ] (HTTPClient should use supplied IP / avoid DNS
lookup) ensures DNSJava cache is checked -- but this is
not necessarily the same IP as in CrawlHost instance.
(Caching TTLs may vary -- we use a minimum regardless
of what the DNS recommended. Or, even if they match,
there could be a small window between when
PreconditionEnforcer decides the existing IP is OK, and
when FetchHTTP checks the DNSJava cache.) So, the
actual IP contacted may be different than the DNS info
that was previously logged.
(3) When logging that HTTP response to ARC,
ARCWriterProcessor.getHostAddress() looks directly back
to CrawlHost, and so may log an IP that was not used.
FetchHTTP MUST use an IP that was previously discovered
via a logged DNS response -- even if this requires us
to add new methods to HTTPClient to use a specified IP
address. (If that works properly, then the ARC issue
will resolve itself, but a way to be sure that the ARC
always shows the right IP would be for the HTTP
transaction to remember the IP it actually uses and
have the ARCWriterProcessor consult that value rather
than the CrawlHost cache).
Michael Stack
Protocols
None
Public
|
Date: 2007-03-14 00:21
|
|
Date: 2005-03-04 02:14 Logged In: YES |
| Field | Old Value | Date | By |
|---|---|---|---|
| status_id | Open | 2005-03-04 02:14 | stack-sf |
| resolution_id | None | 2005-03-04 02:14 | stack-sf |
| close_date | - | 2005-03-04 02:14 | stack-sf |
| assigned_to | nobody | 2005-03-02 19:21 | gojomo |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use