This simple patch provides a configuration option and
plugin API for replicating the already-seen (already
included) list across heritrix instances. This is a
simple solution to one of the first steps at a
distributed heritrix. Although it does nothing for
choosing which URI to distribute to which instance,
etc., it simply ensures that heritrix instances don't
crawl the same URI's twice. Complete replication is
the easiest (and fastest access) way to do this. Note
that this API does nothing for automatically adding new
instances to the group, they must be loaded with the
full list manually for now. It just ensures every
instance maintains the same, replicated, already-seen
while the group is running.
The implementation of such replication is likely to
depend on further libraries, so it is left to
instantiation via reflection and the build of it is
external to the default heritrix binary distribution
for now. My own personal JGroups implementation is
forthcoming.
Note this patch is made against 1.8.0, but is very
simple so it should port to the 1.9/10 head easily.
Nobody/Anonymous
multimachine
1.8.0
Public
|
Date: 2007-03-14 01:50
|
|
Date: 2006-08-30 18:05 Logged In: YES |
|
Date: 2006-08-29 22:32 Logged In: YES |
|
Date: 2006-08-29 22:03 Logged In: YES |
|
Date: 2006-08-29 21:38 Logged In: YES |
|
Date: 2006-08-29 20:28 Logged In: YES |
|
Date: 2006-08-29 19:41 Logged In: YES |
|
Date: 2006-08-28 21:58 Logged In: YES |
| Filename | Description | Download |
|---|---|---|
| uri_unique_replication.patch | fixed fixed fixed (added "transient" and serialVersionUID and stop) patch to support uri uniq replication | Download |
| NotificationBusUniqUriReplicator.java | fixed fixed JGroups already-seen replicator using their NotificationBus building block with debug info | Download |
| Field | Old Value | Date | By |
|---|---|---|---|
| close_date | - | 2007-03-14 01:50 | karl-ia |
| status_id | Open | 2007-03-14 01:50 | karl-ia |
| File Deleted | 191238: | 2006-08-30 18:06 | ecjensen |
| File Added | 191372: NotificationBusUniqUriReplicator.java | 2006-08-30 18:06 | ecjensen |
| File Deleted | 191237: | 2006-08-30 18:05 | ecjensen |
| File Added | 191371: uri_unique_replication.patch | 2006-08-30 18:05 | ecjensen |
| File Deleted | 191035: | 2006-08-30 02:34 | ecjensen |
| File Added | 191238: NotificationBusUniqUriReplicator.java | 2006-08-30 02:34 | ecjensen |
| File Deleted | 191216: | 2006-08-30 02:33 | ecjensen |
| File Deleted | 191204: | 2006-08-30 02:33 | ecjensen |
| File Added | 191237: uri_unique_replication.patch | 2006-08-30 02:33 | ecjensen |
| File Added | 191216: uri_unique_replication.patch | 2006-08-29 22:03 | ecjensen |
| File Added | 191204: uri_unique_replication.patch | 2006-08-29 20:28 | ecjensen |
| File Deleted | 191150: | 2006-08-29 20:28 | ecjensen |
| File Added | 191150: uri_unique_replication.patch | 2006-08-29 14:17 | ecjensen |
| File Deleted | 191034: | 2006-08-29 14:17 | ecjensen |
| File Added | 191035: NotificationBusUniqUriReplicator.java | 2006-08-28 21:59 | ecjensen |
| File Deleted | 190779: | 2006-08-28 21:58 | ecjensen |
| File Added | 191034: uri_unique_replication.patch | 2006-08-28 21:58 | ecjensen |
| File Added | 190779: uri_unique_replication.patch | 2006-08-26 17:05 | ecjensen |
Copyright © 2009 Geeknet, Inc. All rights reserved. Terms of Use