Below is a message from the list by Dave Skinner:
I've noticed I'm getting different (and I hope better)
crawl results since
I put the following code into FetchHTTP.java
method.setRequestHeader("User-Agent", userAgent);
method.setRequestHeader("From",
order.getFrom(curi));
/////////////////dave skinner
// rfc 2616 says no referer header if referer
is https and the url
is not
String via = curi.flattenVia() ;
if ( ! via.equals("") && via.startsWith("http:") )
method.setRequestHeader("Referer", via) ;
/////////////////end dave skinner
// Set retry handler.
method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
new HeritrixHttpMethodRetryHandler());
This is working find for me but I'm sure someone can
supply a case where it
should not be done*sigh*. So I suppose it should
possibly be wrapped with
a parameter check.
However, maybe instead of a parameter check what should
be done is to check
that there is no referer or referrer header in
ATTR_ACCEPT_HEADERS. If
there is, suppress the automatic one, otherwise always
output it.
I'd be happy to code either (or both) of the above
modifications and test them.
Michael Stack
None
None
Public
|
Date: 2007-03-14 01:38
|
|
Date: 2005-03-07 21:48 Logged In: YES |
|
Date: 2005-03-07 21:46 Logged In: YES |
|
Date: 2005-03-02 19:07 Logged In: YES |
|
Date: 2005-03-02 16:50 Logged In: YES |
|
Date: 2005-02-10 00:38 Logged In: YES |
|
Date: 2005-02-10 00:31 Logged In: YES |
|
Date: 2005-02-09 21:53 Logged In: YES |
|
Date: 2005-02-09 19:48 Logged In: YES |
|
Date: 2005-02-01 21:01 Logged In: YES |
| Field | Old Value | Date | By |
|---|---|---|---|
| status_id | Open | 2005-03-07 21:46 | stack-sf |
| close_date | - | 2005-03-07 21:46 | stack-sf |
| assigned_to | nobody | 2005-03-02 18:42 | stack-sf |
| priority | 5 | 2005-02-09 19:48 | stack-sf |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use