From: <sz...@us...> - 2010-02-18 19:05:42
|
Revision: 6774 http://archive-crawler.svn.sourceforge.net/archive-crawler/?rev=6774&view=rev Author: szznax Date: 2010-02-18 19:05:36 +0000 (Thu, 18 Feb 2010) Log Message: ----------- amended H1 port of fix for [HER-1533] robots.txt fetch referer header is via url which is confusing * FetchHTTP.java protect against NPE when checking viaContext Modified Paths: -------------- trunk/heritrix/src/java/org/archive/crawler/fetcher/FetchHTTP.java Modified: trunk/heritrix/src/java/org/archive/crawler/fetcher/FetchHTTP.java =================================================================== --- trunk/heritrix/src/java/org/archive/crawler/fetcher/FetchHTTP.java 2010-02-18 01:52:41 UTC (rev 6773) +++ trunk/heritrix/src/java/org/archive/crawler/fetcher/FetchHTTP.java 2010-02-18 19:05:36 UTC (rev 6774) @@ -792,8 +792,10 @@ } if (((Boolean)getUncheckedAttribute(curi, - ATTR_SEND_REFERER)).booleanValue() && - !Link.PREREQ_MISC.equals(curi.getViaContext().toString())) { + ATTR_SEND_REFERER)).booleanValue() + && (curi.getViaContext()==null || + !Link.PREREQ_MISC.equals( + curi.getViaContext().toString()))) { // RFC2616 says no referer header if referer is https and the url // is not String via = curi.flattenVia(); This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |