Thread: [Lurker-users] Mail list limits
Brought to you by:
terpstra
From: Robert W. <ro...@a1...> - 2010-12-22 01:30:40
|
I think I found the problem that was causing one of my lists to be replaced with httpd errors. Somewhere around 3000 messages, the lurker index process hangs. The cron kicks off another and another with all of them hanging and the database gets corrupted. Ive been experimenting and somewhere close to 3000 messages is where it starts to hang. 100 no prob. 200 no prob although it takes nearly a minute. Even 2500 does OK but taking over a minute. I don't mind is taking some time, but if I go off for 8 hours and its still not done (run manually) I have to suspect something is wrong. My temporary solution is to use mutt to break up the mail folder into smaller chunks so as not to choke Lurker. I haven't a clue how Im going to be able to automate this (break mail folder on logical boundaries like 1st half of month etc, update the lurker.conf file to reflect everything etc) Whats the most messages in an archive that anyone here has had under Lurker ? |
From: Wesley W. T. <we...@te...> - 2010-12-22 09:23:36
|
On Wed, Dec 22, 2010 at 9:06 AM, Wesley W. Terpstra <we...@te...> wrote: > On Wed, Dec 22, 2010 at 2:30 AM, Robert Woodworth > <ro...@a1...> wrote: >> Ive been experimenting and somewhere close to 3000 messages is where it >> starts to hang. > How are you feeding messages to lurker-index, exactly? Nevermind, I found your earlier post. You are running fetchmail to get mail for multiple mailing lists, splitting it into mailboxes based on the 'To:' field and then running lurker over the entire resulting mailbox. I can see now why lurker is getting slower and slower. I think from the context of your setup that you still misunderstand how 'lurker-index' is intended to be run. You are supposed to run it on messages as they arrive, >ONE TIME ONLY<. Every time you feed a message to lurker it puts it into the database again. That means that if you are indexing the same mailbox over and over, your mailbox is getting very very large as each message gets inserted repeatedly. lurker assumes that those old message were delivered to the list again and faithfully rearchives them for you. So, consider your example where you run fetchmail followed by lurker every 10 minutes. If there are 3000 messages in that mailbox, after one day lurker-index will have been run 24*6=144 times. You now have a database that is 144* the size it should be. Here's how you should be using lurker-index: In a typical setup with one email address per list, the MTA delivers the incoming messages one-at-a-time as they arrive to lurker-index with the '-m' option. Each time a mail arrives, it (and only it) gets routed to lurker-index. In a setup with fetchmail with one email per list, fetchmail gets the email and feeds the (new) messages to lurker-index. In a setup with fetchmail with one email shared between all lists (this is your setup), fetchmail gets the email and feeds it to procmail. procmail splits the email stream back into the different lists and feeds each email to lurker-index. The problem with your setup is that your procmail isn't feeding messages to lurker-index; it is feeding them to a mailbox. Then, you are running lurker-index on the entire mailbox instead of only the new messages. In my earlier response to your postings, I asked you to use this: A perhaps better solution is to feed the mail directly to lurker-index > from your procmail rule. > /usr/share/doc/lurker/README.procmail describes this setup. > :0 w > * ^X-Mailing-List: <exa...@li...>.* > | lurker-index -l example-list -m > There are two key differences here between my proposal and what you are doing. 1) I split up the email based on a reliable header (X-Mailing-List) instead of the 'To' field. Please, find the header the your mailing lists add and filter on that instead. 2) Most importantly, the rule feeds messages directly to lurker-index using the procmail pipe syntax (|). This means that each new email gets routed to lurker-index instead of a mailbox. Due to point #2, lurker-index only gets invoked on the new messages. There is no need to run lurker-index after running fetchmail; lurker-index already got run by procmail where necessary. I hope this clears things up. As for the database corruption, even using lurker-index incorrectly should not be causing this. I'm guessing it has something to do with having multiple lurker-indexes blocking at the same time and I'm seeing if I can reproduce the problem. |
From: Wesley W. T. <we...@te...> - 2011-01-08 13:50:51
|
On Fri, Jan 7, 2011 at 11:28 PM, Robert Woodworth <ro...@a1...>wrote: > Now that everything goes directly to lurker-index, if something goes awry > with the database, Im screwed right ? > The files /var/lib/lurker/your-mailing-list are gzip'd mboxes. Do not modify them or open them in any sort of mail client or you will mess things up. However, zcat list > a-copy-of-the-list will get you a valid mailbox. So when I ran “ lurker-index –l support –I support ; lurker-index –l support > –I oldsupport”, I seem to have corrupted the index. > > I don't understand why you have two '-l' options? > Is there a repair for the index ? Or am I stuck with it ? > You can rebuild the database from those internal mbox copies lurker keeps using lurker-regenerate. |
From: Robert W. <ro...@a1...> - 2011-01-08 18:22:06
|
Boy have I made a mess of this thing. I wish some of the details that Im getting in this mail exchange were in the manpage. 1st of all, one of those l's was meant to be a i OK so those are mboxes rather than databases in /var/lib/lurker This is a relief. Heres the problem: I modified procmailrc to send directly to lurker -index. I also ran lurker-index on the old mailbox files I used to route procmail to. 1st lists no problem. Support list, the lurker-index process hangs. I also think I may have indexed 1 of the lists more than once, which you cautioned me not to do. If a list is indexed too many times, will lurker-regenerate fix it ? (You said that overindexing leads to oversized databases) Since I have the old mbox, and I have the zipped one that contains a mix of the old and new, Is there a way to merge and de duplicate and then index the result ? From: Wesley W. Terpstra [mailto:we...@te...] Sent: Saturday, January 08, 2011 5:51 AM To: Robert Woodworth Cc: lurker-users Subject: Re: [Lurker-users] Mail list limits On Fri, Jan 7, 2011 at 11:28 PM, Robert Woodworth <ro...@a1...> wrote: Now that everything goes directly to lurker-index, if something goes awry with the database, Im screwed right ? The files /var/lib/lurker/your-mailing-list are gzip'd mboxes. Do not modify them or open them in any sort of mail client or you will mess things up. However, zcat list > a-copy-of-the-list will get you a valid mailbox. So when I ran " lurker-index -l support -I support ; lurker-index -l support -I oldsupport", I seem to have corrupted the index. I don't understand why you have two '-l' options? Is there a repair for the index ? Or am I stuck with it ? You can rebuild the database from those internal mbox copies lurker keeps using lurker-regenerate. |
From: Robert W. <ro...@a1...> - 2011-03-11 21:09:09
|
Whats the best way to do backups of the Lurker system? If I run a cron job to copy the database over to another location, how do I ensure that my backup and an indexing process do not collide ? |
From: Wesley W. T. <we...@te...> - 2011-03-11 21:46:42
|
On Fri, Mar 11, 2011 at 9:53 PM, Robert Woodworth <ro...@a1...>wrote: > Whats the best way to do backups of the Lurker system? > > > > If I run a cron job to copy the database over to another location, how do I > ensure that my backup and an indexing process do not collide ? > If you lock the file /var/lib/lurker/db.writer during your backup of the lurker database, this will stop any new messages from being indexed. Another option is a filesystem or block-level snapshot. At any given instant the lurker database is consistent, so any sort of snapshotting technology will work. |
From: Robert W. <ro...@a1...> - 2011-03-11 21:30:52
|
This is embarrassing, I really SHOULD know how to do this, but I don't. How do I lock this file ? When you say block level, you mean something like "dump" or "tar" ? From: Wesley W. Terpstra [mailto:we...@te...] Sent: Friday, March 11, 2011 1:25 PM To: Robert Woodworth Cc: lurker-users Subject: Re: [Lurker-users] Back up of the Lurker system On Fri, Mar 11, 2011 at 9:53 PM, Robert Woodworth <ro...@a1...> wrote: Whats the best way to do backups of the Lurker system? If I run a cron job to copy the database over to another location, how do I ensure that my backup and an indexing process do not collide ? If you lock the file /var/lib/lurker/db.writer during your backup of the lurker database, this will stop any new messages from being indexed. Another option is a filesystem or block-level snapshot. At any given instant the lurker database is consistent, so any sort of snapshotting technology will work. |
From: Wesley W. T. <we...@te...> - 2011-03-11 22:01:11
|
On Fri, Mar 11, 2011 at 10:30 PM, Robert Woodworth <ro...@a1...>wrote: > This is embarrassing, I really SHOULD know how to do this, but I don’t. > > How do I lock this file ? > >From a shell script: flock -e /var/lib/lurker/db.writer -c your-script-that-runs-tar-or-whatever > When you say block level, you mean something like “dump” or “tar” ? > I meant something like LVM. |