Date: Fri, 30 Apr 2004 21:59:41 -0400
To: Michael Stack <stack@archive.org>
Subject: Re: [archive-crawler] Cdx from arc files
Reply-To: tree@basistech.com
Return-Path: tree@basistech.com
Hi,
Is there a way I can add HTTP headers to the crawl? In
particular I
would like to add an Accept-Language: header ---
several of the sites
I want to crawl push English content by default unless
told that the
site is wants Arabic.
Thanks.
-tree
--
Tom Emerson
Basis Technology Corp.
Software Architect
http://www.basistech.com
"Beware the lollipop of mediocrity: lick it once and
you suck forever"
To do above, would need way of listing headers to set
and they'd be included per request somewhere around here:
Index: src/java/org/archive/crawler/fetcher/FetchHTTP.java
===================================================================
RCS file:
/cvsroot/archive-crawler/ArchiveOpenCrawler/src/java/org/archive/crawler/fe
tcher/FetchHTTP.java,v
retrieving revision 1.45
diff -u -r1.45 FetchHTTP.java
--- src/java/org/archive/crawler/fetcher/FetchHTTP.java
28 Apr 2004 01:42:04 -0000 1.45
+++ src/java/org/archive/crawler/fetcher/FetchHTTP.java
1 May 2004 07:20:41 -0000
@@ -200,6 +200,7 @@
curi.getUURI().getURIString(), rec);
configureMethod(curi, method);
boolean addedCredentials =
populateCredentials(curi, method);
+ method.addRequestHeader(new Header());
int immediateRetries = 0;
while (true) {
// Retry until success (break) or
unrecoverable exception
Michael Stack
Configuration
None
Public
|
Date: 2007-03-14 01:30
|
|
Date: 2004-07-07 15:51 Logged In: YES |
|
Date: 2004-07-05 19:10 Logged In: YES |
|
Date: 2004-07-02 23:09 Logged In: YES |
|
Date: 2004-05-05 21:20 Logged In: YES |
| Field | Old Value | Date | By |
|---|---|---|---|
| close_date | - | 2004-07-07 15:51 | stack-sf |
| status_id | Open | 2004-07-07 15:51 | stack-sf |
| priority | 5 | 2004-05-05 21:20 | stack-sf |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use