Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

5 Max # of arcs not being respected. - ID: 910210
Last Update: Comment added ( karl-ia )

Kris did a broad crawl and saw that though he'd set
maximum number of arcs writers to 3, he had upwards of
64 arcs in his disk dir. I was able to reproduce using
the seeds and order file he passed me.


Michael Stack ( stack-sf ) - 2004-03-05 01:43

5

Closed

Fixed

Michael Stack

General

None

Public


Comments ( 2 )

Date: 2007-03-14 00:08
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-88 -- please add further
comments at that location.


Date: 2004-03-05 01:53
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Problem was multiple instances of ARCWriterPool when should
only be one. Made single static reference shared by all
arcwriterprocessors and made the setup of the pool synchronized.

Below is the commit message. Closing.

Problem was multiple instances of ARCWriterPool when should
only be one.
Made single static reference shared by all
arcwriterprocessors and made the
setup of the pool synchronized.
* src/java/org/archive/crawler/basic/ARCWriterProcessor.java
(pool): Made static private.
(initialize): Moved bulk to new synchronized _initialize
method. This
method just checks the "is initialized" flag and if not,
then we go into the synchronized block whose only
purpose is setup of
the pool.
* src/java/org/archive/crawler/fetcher/FetchHTTP.java
Removed curi.setHttpRecorder(null) at head of the
innerProcess method.
The clearing of last httprecorder is being done at tail
of the processing
chain so this call is unnecessary (If the tail clearing
of httprecord is
not working, then there's a problem).
* src/java/org/archive/io/arc/ARCWriterPool.java
Added logging of initial pool configuration.



Attached File

No Files Currently Attached

Changes ( 3 )

Field Old Value Date By
close_date - 2004-03-05 01:53 stack-sf
status_id Open 2004-03-05 01:53 stack-sf
resolution_id None 2004-03-05 01:53 stack-sf