On Tue, Sep 23, 2003 at 05:33:01AM +0800, Federico Sevilla III wrote:
> I haven't posted in awhile, largely because everything's been going
> smoothly (no news is good news, right?). I continue to run lurker 0.8.1
Tsck. Tsck. You know I fix bugs in releases, right? =)
Admit it, you just don't want to change UIs.
> on a Debian Sid box, primarily because lurker 1.0 hasn't hit the Debian
> archives, yet.
Mail the maintainer; if he doesn't do something soon, lurker 1.0 will miss
sarge. That would be a shame.
> True enough, I'm finding some discrepancies, as follows:
> jijo@...$ cat db
> 255 8192 255
> jijo@...$ ls -la db.*
> X -rw-rw-r-- 1 lurker lurker 1117896704 May 28 08:33 db.iue
> X -rw-rw-r-- 1 lurker lurker 563175424 May 28 19:07 db.ist
> X -rw-rw-r-- 1 lurker lurker 284237824 Jul 6 19:22 db.ifd
> X -rw-rw-r-- 1 lurker lurker 80822272 Aug 22 22:06 db.ifv
> X -rw-rw-r-- 1 lurker lurker 51101696 Sep 22 01:45 db.izy
> -rw-rw-r-- 1 lurker lurker 19251200 Sep 23 05:17 db.ijt
> -rw-rw-r-- 1 lurker lurker 10403840 Sep 23 05:23 db.ixa
> -rw-rw-r-- 1 lurker lurker 221184 Sep 23 05:23 db.izj
> X -rw-rw-r-- 1 lurker lurker 8192 Sep 23 05:23 db.ily
> -rw-rw-r-- 1 lurker lurker 0 May 27 11:00 db.writer
It seems you have all the databases of sizes down to db.izy which has
timestamp Sep 22, 01:45. This has relavance if you opt for editting db.
It is super hard to know what has gone wrong by this point...
Looking through the changelog I see these changes since version 0.8:
Fixed a race condition with multiple writers
Fixed a bug with databases over 6Gb
I believe you already have the multiple writer fix?
If not, importing at the same time an email is delivered could go BOOM!
(This would be consistent with the behaviour above, so check your ChangeLog)
The other one also looks like it might be related since you have an index
close to 2Gb which probably means your compressed mail is almost 6Gb.
However, judging by the timestamps, I find this less likely.
Did you have a computer crash sometime recently which might have left some
of the directory information unflushed? I know one of the changes somewhere
between 0.8 and 1.0 (which I didn't put in the changelog; bad me), was to
add an fsync(..) on the directory information. If you are importing with
'-f', you also lose any form of atomic updates.
> I could attempt to modify /var/lib/lurker/db, but I wager that won't
> make Wesley happy.
Well, you could try it. If it works, great. =P
Furthermore, you might be able to get the bug to be reproducible if you do.
It is pretty much impossible for me to debug the problem from this state.
> This isn't the first time this has happened to me, but the last time,
> which was quite some time ago, I remember just purging and starting from
> scratch which I hope I won't have to do this time.
What was the cause of it last time? Did we fix it?
Reimporting is your best bet to be sure you don't have consistency problems.
However, I had really thought I had eliminated all the bugs of this sort, so
if you can reproduce it by editting the /db file, I am all for it.
I am fairly certain that if files go missing like this, all that can
happen is that some of the messages (depending on which files got lost) will
simply not appear in the database anymore. I believe it should still be a
consistent view, but of a subset of the messages. However, if you see
consistency issues, I will retract this claim immediately. =)
So, if you opt to edit db, you should: erase all files smaller than db.izy.
Only list those files which remain in db. Then, reimport every email you
imported since Sep 22, 01:45. This will give you the best chance of
recovering a consistent database with no losses.
If you don't have the multiple writer bugfix, freeze your delivery queue to
prevent this from being an issue when you reimport mail later than Sep 22.
If you don't have the 6Gb bugfix, pray. =)
If you have both bugfixes, and it happens again, and you can reproduce it,
please let me know!
Wesley W. Terpstra <wesley@...>