Share

Heritrix: Internet Archive Web Crawler

Tracker: Feature Requests

7 ARC writer pool config. to write multiple disks - ID: 988276
Last Update: Comment added ( karl-ia )

Need to be able to add to the pool the ability to write
more than one directory so we can set up a crawler
writing more than one disk.


Michael Stack ( stack-sf ) - 2004-07-09 22:35

7

Closed

None

Michael Stack

i/o

None

Public


Comments ( 4 )

Date: 2007-03-14 01:32
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-798 -- please add further
comments at that location.


Date: 2004-12-11 01:52
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Fixed. Commit message below.

Make settings for arcwriter 'live'. Make it so you can have
multiple
writing directories and that you can add and subtract midcrawl.
Also upgrade pooling jar to 1.2 from 1.1.
* .classpath
* project.xml
* project.properties
Reference new commons-pool-1.2 jar. Replaces 1.1
* src/conf/profiles/default/order.xml
* src/conf/selftest/order.xml
(path): Changed type from String to StringList.
* src/java/org/archive/crawler/writer/ARCWriterProcessor.java
New pattern for how to do settings. The way our
settings framework
currently works, everything must be a Setting; means
settings touch all
code/components. This pattern does an inversion. The
implementation is
done off in ARCWriterPool which intentionally knows
nothing of our
settings framework. It takes a 'settings' object;
ARCWriterSettings.
The settings object is an interface which has ARCWriter
parameters.
The ARCWriterProcessor implements this parameters
interface consulting the
settings framework to get 'live' values.
Removed all parameter data settings. Go via the
settings interface
to obtain values.
Converted 'path' from String to StringList type so it
can hold list
of directories. Made better help strings. General cleanup.
(readConfiguration, getOutputDir, setArcMaxSize,
setArcSuffix,
setUseCompression, useCompression): removed.
(getAttributeUnchecked, getOutputDirs, isCompressed): Added.
* src/java/org/archive/io/arc/ARCWriter.java
Method ensureWriteableDirectory was moved to IoUtils from
ArchiveUtils. Go to ARCWriterSettings for parameters.
Removed
holding of parameters in static data members.
Roundrobin through write directories.
Use passed in ARCWriterSettings to get all parameters.
(ARCWriter): Removed unused constructors.
(isCompress, getMaxSize...): Removed.
(roundRobinIndex): Added.
(getNextDirectory, checkWriteable, getSettings): Added.
* src/java/org/archive/io/arc/ARCWriterPool.java
Takes ARCWriterSettings instead of individual parameters.
Refactoring cleanup.
(getSettings): Added.
(get*): Removed. Use the getSettings to get at parameters.
* src/java/org/archive/io/arc/ARCWriterPoolTest.java
Changed to suit change in API.
* src/java/org/archive/io/arc/ARCWriterTest.java
Changed to suit change in API.
* src/java/org/archive/util/ArchiveUtils.java
* src/java/org/archive/util/IoUtils.java
* src/java/org/archive/crawler/writer/MirrorWriterProcessor.java
Method ensureWriteableDirectory was moved to IoUtils from
ArchiveUtils.



Date: 2004-12-03 19:31
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

For french crawl, talking with Igor and Gordon.

+ Multiple ddirectories.
+ Switch midcrawl.
+ Overall quota.


Date: 2004-12-03 19:30
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

For french crawl, talking with Igor and Gordon.

+ Multiple ddirectories.
+ Switch midcrawl.
+ Overall quota.


Attached File

No Files Currently Attached

Changes ( 4 )

Field Old Value Date By
status_id Open 2004-12-11 01:52 stack-sf
close_date - 2004-12-11 01:52 stack-sf
priority 6 2004-12-03 22:48 gojomo
priority 5 2004-09-01 22:02 stack-sf