Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

7 Excessive ARCWriterPool timeouts: - ID: 935146
Last Update: Comment added ( karl-ia )

Ran a broad crawl w/200 toe threads. Let it run for
about 12 minutes/26,000 resources before hitting pause.
Within that time, got a lot of ARCWriterPool timeouts:

java.util.NoSuchElementException: Timeout waiting for
idle object
at
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPo
ol.java:801)
at
org.archive.io.arc.ARCWriterPool.borrowARCWriter(ARCWriterPool.java:144)
at
org.archive.crawler.writer.ARCWriterProcessor.writeHttp(ARCWriterProcessor.
java:287)
at
org.archive.crawler.writer.ARCWriterProcessor.innerProcess(ARCWriterProcess
or.java(Compiled
Code))
at
org.archive.crawler.framework.Processor.process(Processor.java(Compiled
Code))
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java(Comp
iled
Code))
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java:124)
java.util.NoSuchElementException: Timeout waiting for
idle object
at
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPo
ol.java(Compiled
Code))
at
org.archive.io.arc.ARCWriterPool.borrowARCWriter(ARCWriterPool.java(Compile
d
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.writeHttp(ARCWriterProcessor.
java(Compiled
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.innerProcess(ARCWriterProcess
or.java(Compiled
Code))
at
org.archive.crawler.framework.Processor.process(Processor.java(Compiled
Code))
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java(Comp
iled
Code))
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java:124)
java.util.NoSuchElementException: Timeout waiting for
idle object
at
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPo
ol.java(Compiled
Code))
at
org.archive.io.arc.ARCWriterPool.borrowARCWriter(ARCWriterPool.java(Compile
d
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.writeHttp(ARCWriterProcessor.
java(Compiled
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.innerProcess(ARCWriterProcess
or.java(Compiled
Code))
at
org.archive.crawler.framework.Processor.process(Processor.java(Compiled
Code))
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java(Comp
iled
Code))
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java:124)
java.util.NoSuchElementException: Timeout waiting for
idle object
at
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPo
ol.java(Compiled
Code))
at
org.archive.io.arc.ARCWriterPool.borrowARCWriter(ARCWriterPool.java(Compile
d
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.writeHttp(ARCWriterProcessor.
java(Compiled
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.innerProcess(ARCWriterProcess
or.java(Compiled
Code))
at
org.archive.crawler.framework.Processor.process(Processor.java(Compiled
Code))
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java(Comp
iled
Code))
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java:124)
java.util.NoSuchElementException: Timeout waiting for
idle object
at
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPo
ol.java(Compiled
Code))
at
org.archive.io.arc.ARCWriterPool.borrowARCWriter(ARCWriterPool.java(Compile
d
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.writeDns(ARCWriterProcessor.j
ava(Compiled
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.innerProcess(ARCWriterProcess
or.java(Compiled
Code))
at
org.archive.crawler.framework.Processor.process(Processor.java(Compiled
Code))
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java(Comp
iled
Code))
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java:124)
java.util.NoSuchElementException: Timeout waiting for
idle object
at
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPo
ol.java(Compiled
Code))
at
org.archive.io.arc.ARCWriterPool.borrowARCWriter(ARCWriterPool.java(Compile
d
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.writeHttp(ARCWriterProcessor.
java(Compiled
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.innerProcess(ARCWriterProcess
or.java(Compiled
Code))
at
org.archive.crawler.framework.Processor.process(Processor.java(Compiled
Code))
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java(Comp
iled
Code))
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java:124)
java.util.NoSuchElementException: Timeout waiting for
idle object
at
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPo
ol.java(Compiled
Code))
at
org.archive.io.arc.ARCWriterPool.borrowARCWriter(ARCWriterPool.java(Compile
d
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.writeHttp(ARCWriterProcessor.
java(Compiled
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.innerProcess(ARCWriterProcess
or.java(Compiled
Code))
at
org.archive.crawler.framework.Processor.process(Processor.java(Compiled
Code))
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java(Comp
iled
Code))
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java:124)
java.util.NoSuchElementException: Timeout waiting for
idle object
at
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPo
ol.java(Compiled
Code))
at
org.archive.io.arc.ARCWriterPool.borrowARCWriter(ARCWriterPool.java(Compile
d
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.writeHttp(ARCWriterProcessor.
java(Compiled
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.innerProcess(ARCWriterProcess
or.java(Compiled
Code))
at
org.archive.crawler.framework.Processor.process(Processor.java(Compiled
Code))
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java(Comp
iled
Code))
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java:124)

==
Something must be amiss to cause this: the documents
weren't especially large; the overall rate at which
documents were being written to disk was sufficient to
match the incoming documents rate; to get even a single
such timeout would indicate someone had waited for 5
minutes with no chance to write their document.

Perhaps the system for assigning writers to threads is
not first-in, first-out fair, and so some threads are
being starved of any chance to write?


Gordon Mohr ( gojomo ) - 2004-04-14 19:28

7

Closed

Fixed

Michael Stack

Disk I/O

1.0.1

Public


Comments ( 6 )

Date: 2007-03-14 00:09
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-113 -- please add further
comments at that location.


Date: 2004-08-26 01:49
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Regards the comment, "Perhaps the system for assigning
writers to threads is not first-in, first-out fair, and so
some threads are
being starved of any chance to write?" above, we're using
GenericObjectPool, an "ObjectPool implementation with a FIFO
(First In First Out) behavior. This queue like behavior
makes sure each object is regularly used. (helps preventing
time-outs)". Well, the way this thing works is that once
you go into borrowObject, it tries to synchronize on the
genericobjectpool itself. As we've seen elsewhere, trying
to get a lock on ToeThreads to tell them pause, there is no
guaranteed 'fairness' regards when threads get a time slice
to run so its possible threads are not getting into
borrowObject in times of high concurrency.

Lets get some more data.

Toward that end I've implemented (3) -- logging state of
pool -- and to make things better than they were, (5),
retrying up to an arbitrary maximum. (3) should give us
notion of why failures. Will let operator know if they need
to toggle (1) and (2) configurations. (I'm guessing that
all of the pool vitals will come out right. More
interesting would be vitals on threads waiting on the
ObjectPool monitor and how long each has been waiting).
Below is sample of whats logged on retry.

Lets work on a (4) if the above doesn't cut it. We'll at
least have more idea of what the inputs for (4)'s heuristic
should look like.

SEVERE: Timeout waiting for idle object: Retry #0 of max of
10: NSEE Pool State: Active 3 of max 3, idle 0, time 100ms
of max 100ms
Aug 25, 2004 5:43:54 PM org.archive.io.arc.ARCWriterPool
borrowARCWriter
SEVERE: Timeout waiting for idle object: Retry #1 of max of
10: NSEE Pool State: Active 3 of max 3, idle 0, time 100ms
of max 100ms
Aug 25, 2004 5:43:54 PM org.archive.io.arc.ARCWriterPool
borrowARCWriter
SEVERE: Timeout waiting for idle object: Retry #2 of max of
10: NSEE Pool State: Active 3 of max 3, idle 0, time 101ms
of max 100ms
....

I'm going to close this issue because of retry. If we have
to look at this issue again, then I'd do research on whats
happening at times of high concurrency OR, I'd look at using
StackObjectPool,
http://jakarta.apache.org/commons/pool/apidocs/org/apache/commons/pool/impl/StackObjectPool.html,
which is not synchronized. It doesn't have an upper bound
on the number of allowed active items. It'll grow without
bound. I'd look and see if what it produces is hundreds of
arc instances with hardly anything in them made during
'bursts' of high concurrency.

Fix for "[ 935146 ] Excessive ARCWriterPool timeouts"
See the issue for comments on why this fix is better than
what was there before
but that we may have to revisit.
* src/java/org/archive/io/arc/ARCWriterPool.java
(borrowARCWriter): Added a retry loop and severe logging
on exception.
(getPoolState): Utility method that returns string
describing state of
pool.
* src/java/org/archive/io/arc/ARCWriterPoolTest.java
Upped allowable max wait time now we're doing retries.


Date: 2004-08-23 23:33
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Reopening; saw 34 of these on the NARA-MIL tet crawl.
(Perhaps more on the NARA-GOV test crawl, too). Pool-wait is
set at 300 seconds, pool-size is 5. An example Alert from
the UI:

==
Title: Problem occured processing
'http://pmcop.dau.mil/simplify/ev.php?ID=6510_201&ID2=DO_TOPIC'
Time: Aug. 23, 2004 13:37:07 GMT
Level: SEVERE
Message:

Problem java.util.NoSuchElementException: Timeout waiting
for idle object occured when trying to process
'http://pmcop.dau.mil/simplify/ev.php?ID=6510_201&ID2=DO_TOPIC'
at step ABOUT_TO_BEGIN_PROCESSOR


Associated Throwable: java.util.NoSuchElementException:
Timeout waiting for idle object

Message:
Timeout waiting for idle object

Stacktrace:
java.util.NoSuchElementException: Timeout waiting for idle
object
at
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java(Compiled
Code))
at
org.archive.io.arc.ARCWriterPool.borrowARCWriter(ARCWriterPool.java(Compiled
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.writeHttp(ARCWriterProcessor.java(Compiled
Code))
at
org.archive.crawler.writer.ARCWriterProcessor.innerProcess(ARCWriterProcessor.java(Compiled
Code))
at
org.archive.crawler.framework.Processor.process(Processor.java(Compiled
Code))
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java(Compiled
Code))
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java(Compiled
Code))
==

They occur in batches: first 4 at exactly the same clock
time, then 8 several hours later, then 20 a minute later,
then 1 a few hours later, then another 1 a half hour later.

Could be correlated to times when a few giant files are
being written at once, monopolizing the pool. I believe that
when this error occurs, the corresponding URI's contents are
never written.

It seems a bit odd that big batches of URIs all fail at the
exact same
time. Perhaps something is notify()ing them out of their
wait early?

Things to consider:
(1) longer default wait
(2) larger default pool size
(3) debugging instrumentation to capture what's holding up
the pool when a timeout occurs
(4) dynamically growing the pool when necessary
(5) noting a timeout and then retrying as many times as
necessary to write content (better to have threads spin
indefinitely over pool than lose content due to a transient
holdup)




Date: 2004-04-30 03:10
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Ran a broad crawl, 700 seeds, 200 threads. Let it run for
fifteen minutes (17000 pages downloaded). I do not see any
of the above exceptions in the log. Only time outs are http
client in local-errors.log. I looked in all logs and in
heritrix_out.log.

I'm running with default of 5 arcwriters (Perhaps this issue
was filed before john updated the default from 3 to 5?)

The pool assigns on first come, first served basis. That'd
put the problem of threads not getting time to write on to
the thread scheduler.

Closing because cannot reproduce and the issue was possibly
fixed by the upping of our default arc writer pool size from
3 to 5 files and that if we see this again, we can up the
timeout, up the number of arc files in the pool or down the
number of concurrent threads.


Date: 2004-04-28 15:33
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Assigning to myself.


Date: 2004-04-20 23:55
Sender: ia_igorProject Admin

Logged In: YES
user_id=715474

I got this error when I ran out of disk space. Do you
remember if this was the case?


Attached File

No Files Currently Attached

Changes ( 12 )

Field Old Value Date By
close_date 2004-04-30 03:10 2004-08-26 01:49 stack-sf
resolution_id None 2004-08-26 01:49 stack-sf
status_id Open 2004-08-26 01:49 stack-sf
artifact_group_id None 2004-08-23 23:58 gojomo
priority 5 2004-08-23 23:58 gojomo
resolution_id Works For Me 2004-08-23 23:33 gojomo
status_id Closed 2004-08-23 23:33 gojomo
status_id Open 2004-04-30 03:10 stack-sf
close_date - 2004-04-30 03:10 stack-sf
resolution_id None 2004-04-30 03:10 stack-sf
assigned_to nobody 2004-04-28 15:33 stack-sf
category_id None 2004-04-28 15:33 stack-sf