Share

Heritrix: Internet Archive Web Crawler

Tracker: Feature Requests

9 BdbWorkQueue origins should be based on full classKey - ID: 1223840
Last Update: Comment added ( karl-ia )

BdbWorkQueue 'origin' keys are currently based on a
long(8-byte) fingerprint of the classKey. This means
there's a small chance of collision.

They could be based on a utf8 encoding of the full
classKey, terminated by a '\0'. No queues would
overlap/collide, though keys would be arbitrarily
larger -- perhaps 3x or so larger on average.


Gordon Mohr ( gojomo ) - 2005-06-20 00:39

9

Closed

None

Gordon Mohr

None

1.6.0

Public


Comments ( 3 )

Date: 2007-03-14 01:42
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-949 -- please add further
comments at that location.


Date: 2005-08-05 00:27
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Changed. Commit comment:

Implementation of [ 1223840 ] BdbWorkQueue origins should be
based on full classKey
* BdbMultipleWorkQueues.java
change calculateInsertKey to use full bytes of classKey,
not just 8-byte fingerprint, as prefix of insertKey
add calculateOriginKey to keep origin calculation near
insert calculation
* BdbWorkQueue.java
change getKeyPrefixHex to getPrefixClassKey --
prefix/classkey is now displayable text, rather than binary

Change passes unit test of expected key sorting behavior;
setting log level on BdbWorkQueue to FINE shows expected
now-textual prefix/class keys; two short test crawls work as
expected.

Closing as completed.


Date: 2005-08-04 22:28
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

This should be easy and will remove risk of very subtle bugs
in future.


Attached File

No Files Currently Attached

Changes ( 5 )

Field Old Value Date By
artifact_group_id None 2005-09-23 21:08 gojomo
status_id Open 2005-08-05 00:27 gojomo
close_date - 2005-08-05 00:27 gojomo
priority 6 2005-08-04 22:28 gojomo
assigned_to nobody 2005-08-04 22:28 gojomo