Heritrix should be able to resume crashed crawls
quickly, without actually having to recover each URI
line-by-line (which basically is re-crawling without
network I/O, impractical if you already have crawled
millions of pages).
Especially for the BdbFrontier, this might be easy if
the set of URIs already included (BdbUriUniqFilter) and
the queue containing all the pending URIs
(BdbMultipleWorkQueues) can be openend and re-used.
The number of URIs included, pending etc. could be set
by re-counting the queue's contents (hard with Bdb) or
simply taken out of the progress-statistics logfile
(fast, but probably a bit inaccurate).
Karl Thiessen
None
1.6.0
Public
|
Date: 2007-03-14 01:40
|
|
Date: 2005-10-07 17:31 Logged In: YES |
|
Date: 2005-09-30 22:58 Logged In: YES |
|
Date: 2005-09-14 01:09 Logged In: YES |
|
Date: 2005-09-14 00:50 Logged In: YES |
|
Date: 2005-09-13 22:19 Logged In: YES |
|
Date: 2005-09-08 18:00 Logged In: YES |
|
Date: 2005-09-07 22:34 Logged In: YES |
|
Date: 2005-07-07 21:30 Logged In: YES |
|
Date: 2005-07-07 16:24 Logged In: YES |
|
Date: 2005-07-07 05:56 Logged In: YES |
|
Date: 2005-07-07 01:09 Logged In: YES |
|
Date: 2005-07-06 17:31 Logged In: YES |
|
Date: 2005-07-06 02:40 Logged In: YES |
|
Date: 2005-06-25 02:26 Logged In: YES |
|
Date: 2005-06-22 16:33 Logged In: YES |
|
Date: 2005-06-21 13:41 Logged In: YES |
|
Date: 2005-04-28 15:56 Logged In: YES |
|
Date: 2005-04-28 09:02 Logged In: YES |
|
Date: 2005-04-27 15:07 Logged In: YES |
| Field | Old Value | Date | By |
|---|---|---|---|
| status_id | Open | 2005-12-02 17:29 | stack-sf |
| close_date | - | 2005-12-02 17:29 | stack-sf |
| artifact_group_id | None | 2005-09-23 22:10 | gojomo |
| artifact_group_id | 1.6.0 | 2005-09-23 20:58 | gojomo |
| summary | Quick resume without real recovery | 2005-09-23 20:58 | gojomo |
| artifact_group_id | None | 2005-09-23 20:53 | gojomo |
| priority | 7 | 2005-09-23 20:40 | gojomo |
| assigned_to | stack-sf | 2005-09-14 01:09 | stack-sf |
| priority | 5 | 2005-06-22 16:33 | stack-sf |
| assigned_to | nobody | 2005-06-22 16:33 | stack-sf |