Re: [Spamprobe-users] spamprobe-1.1x7 released
Brought to you by:
bburton
From: Brian B. <bb...@us...> - 2005-03-30 15:25:18
|
David A. Lee wrote: > I am wondering, however, what is the failure mode as the hash table gets full? > you mention it starts to 'drop' terms ... sure, the hash buckets get overloaded > and/or 2 strings hash to the same value%DB-size ... but whats the failure mode? > What terms get dropped? Letting the hash file get full would be a Bad Thing. :-) Performance of the hash file will suffer as the file starts to get filled. Lots of collisions will lead to a lot of linear scanning to find records. That'd remove the performance advantage of the hash format. Also, once a hash file is full any attempts to add new records will cause SP to exit with an error message. I did this deliberately since I didn't think that silently dropping new terms would be helpful. The "lossy" nature of the hash algorithm will be essentially random. Which two terms will have the same 32-bit hash value isn't something that's predictable. > Is there a way this could be smart ... say drop terms that are older .. > or less weights etc ? so that in time even a 2MB hash DB > could be trained and trained maybe way way beyond its capicity but the > information it loses is not important information .... Hmm. That's really what the cleanup command is for. It can be used to periodically flush out older terms that haven't been seen in a long time. I think that aggressively running cleanup will be important with hash files. All the best, ++Brian |