Currently, when operating in site-first/hold-queues
mode, there is a target maximum value for the total
number of CrawlURIs that all INACTIVE queues should
have in memory. If all INACTIVE queues, together, have
more than this target, then another hard quota for
individual queues is decremented and enforced whenever
a larger queue is encountered in normal operations. If
all INACTIVE queues, together, have less than this
target, then the per-queue quota is incremented.
The idea was that the per-queue threshold would rise
and drop as necessary to keep the actual number of
INACTIVE queue in-memory CrawlURIs tending towards the
target.
However, when the number of INACTIVE queues exceeds the
target number of in-memory CrawlURIs (as is quite easy
in broad crawls), even a single in-memory CrawlURI for
every queue would result in being way over target. The
per-queue target thus tends to decrement into negative
values, and every CrawlURI destined for an inactive
queue is immediately flushed to disk, avoiding the
intended batching efficiency this mechanism was hoping
to create.
(Further, it may never in practice effectively batch
again.)
This process needs a redesign; the current
implementation is probably not offering any benefit for
its complexity. A plain always-write-through policy
might be just as good and would be much more simple.
Gordon Mohr
None
None
Public
|
Date: 2007-03-14 00:15
|
|
Date: 2004-10-20 21:47 Logged In: YES |
| Field | Old Value | Date | By |
|---|---|---|---|
| resolution_id | None | 2004-10-20 21:47 | gojomo |
| assigned_to | nobody | 2004-10-20 21:47 | gojomo |
| close_date | - | 2004-10-20 21:47 | gojomo |
| status_id | Open | 2004-10-20 21:47 | gojomo |
| priority | 5 | 2004-10-20 21:47 | gojomo |
| priority | 6 | 2004-10-20 21:45 | gojomo |
| priority | 5 | 2004-09-01 21:57 | gojomo |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use