http://www.hughes-family.org/bugzilla/show_bug.cgi?id=1821
------- Additional Comments From felicity@... 2003-04-30 09:53 -------
Subject: Re: msgcount should be msgs learned, not processed
On Tue, Apr 29, 2003 at 12:24:07AM -0700, bugzilla-daemon@... wrote:
> When deciding how many tokens to expire, why not do this:
>
> - after an expire is done, store the following values:
> 1. the change in the disk usage of the DB (from before to after the
> expire)
> 2. the atime delta (newest token purged - oldest token purged)
> 3. the atime of the oldest token retained
>
> - the next time you want to expire, it's an easy calculation to estimate
> the new atime delta based on #1 and #2 and then using #3, you know what
> limit to use. If the DB expired 500k of data last time and needs to
> expire 750k of data this time, then just multiply the delta by 1.5.
Well, #2 is a chicken and egg problem: you need to do an expire to figure
out when to do the first expire. And how do we know the size of what
needs to be expired this time? You can guestimate by saying the average
datasize is token_length + 11, but that really isn't true (it could also
be token_length + 3 depending on if it's a "small use" token).
I was thinking of something like storing oldest and newest atime values.
When we need to expire (ntoks > min_db_size * 140%) we simply do something
like "$expire_atime = ($newest-$oldest) / 140%" and come out with the
approximate atime to expire before.
However, if someone in a single 18hr stretch learns enough tokens to
expire, all the atimes will be the same and it won't be able to expire
any of them.
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
|