Share

Heritrix: Internet Archive Web Crawler

Tracker: Feature Requests

5 HTTPRecorder's default buffer sizes should be configurable - ID: 1057064
Last Update: Comment added ( karl-ia )

HttpRecorder.DEFAULT_OUTPUT_BUFFER_SIZE and
HttpRecorder.DEFAULT_INPUT_BUFFER_SIZE are set to be
4096 and 65536, respectively. All requests less than 4k
in length, and responses less than 64K, can thus be
handled entirely in memory for featching, extracting,
and writing -- only overflows beyond these sizes go to
the scratch disk.

However, if crawling material with larger sizes, and if
RAM is available, some crawl operatorsd might want to
use larger thresholds. There should be a configurable
option to increase these.


Gordon Mohr ( gojomo ) - 2004-10-29 22:01

5

Closed

None

Gordon Mohr

Configuration

None

Public


Comments ( 2 )

Date: 2007-03-14 01:35
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-847 -- please add further
comments at that location.


Date: 2004-11-18 00:54
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Implementation of [ 1057064 ] HTTPRecorder's default buffer
sizes should be configurable
* CrawlOrder.java
add out-buffer, in-buffer size expert settings, with
defaults as previously set
* ToeThread.java
consult CrawlOrder for in/out buffer sizes, and use new
wider HttpRecorder constructor
* HttpRecorder.java
wider constructor allowing specification of in/out
in-memory buffers; older constructor passes through with old
defaults


Attached File

No Files Currently Attached

Changes ( 3 )

Field Old Value Date By
status_id Open 2004-11-18 00:54 gojomo
assigned_to nobody 2004-11-18 00:54 gojomo
close_date - 2004-11-18 00:54 gojomo