Menu

#13 Bogo daemon

open
nobody
5
2012-12-30
2003-02-19
No

Is it possible for bogofilter tu run in daemon mode
(e.g. server-client style)? That way it don't have to
reload the wordlists everytime a message comes.

I'm working at an ISP with more than 1 Million email
users. Email traffic is VERY high. We implemented bogo
on six MTA (mail transport agent) servers, two handles
outgoing mail and four handles incoming mail.

I'm very satisfied with bogofilter result. Most of
times it will correctly classify a spam or non spam.
However, the machine load gets VERY high, especially
disk usage. I guess this is because of the repeated
reading or wordlists (auto-update is disabled).

If bogo could run in daemon mode, the worddlist will
only be loaded once and stays in memory, thus improve
overall system performance (anti virus programs do that
with their virus patterns).

Discussion

  • Matthias Andree

    Matthias Andree - 2003-02-26

    Logged In: YES
    user_id=2788

    Bogofilter does not "reload" the word lists each time a
    message arrives. Bogofilter uses the BerkeleyDB, and opening
    a data base for read-only access is as cheap as opening a
    regular file. BerkeleyDB only loads the parts of the file
    that contain the tokens to look at, and the kernel will
    cache these pages, so make sure your memory is not fit too
    tightly.

    Note also that the more recent BerkeleyDB versions use the
    mmap(2) system call, which "maps" the file into memory where
    it's read on-demand only, and which avoids copying data
    forth and back between the kernel and the application.

    mmap may not work across networked file systems, depending
    on your operating system and version. BerkeleyDB will then
    silently use regular read/write operations, but it will
    still only read the data that it actually needs, and not the
    whole data base.

    If we switched to use a "daemon", we might have to send
    enormous amounts of data between client and server, and I
    wonder if that is really faster than mapping disk blocks
    into the application's data memory.

     
  • Piotr Kubiak

    Piotr Kubiak - 2003-02-27

    Logged In: YES
    user_id=722099

    this would be a great thing to make a daemon, it woud run
    several times faster especially on smaller files

    on high-load systems it is impossible to install bogofilter system-
    wide because of the "low speed" startup, and that's a pity since
    the bogofilter is a great thing

     
  • Matthias Andree

    Matthias Andree - 2003-03-11

    Logged In: YES
    user_id=2788

    Well, my home setup that runs bogofilter 0.11.1.x from mail
    drop version takes like 30 ms wallclock time to process a
    short mail with bogofilter, out of 180 ms total for maildrop
    (without registering, i. e. without -u option to
    bogofilter). With bogofilter -u, it's between 50 and 200 ms
    more. (AMD Duron 700, Linux 2.4, plenty of RAM, 7200/min
    U160-SCSI drive, ext3fs).

    I wouldn't call that "low speed" startup. However, this
    doesn't constitute a statement about high-load systems. If
    anyone could come up with details where exactly bogofilter
    takes so long, that would be much appreciated. An idea to
    obtain such logs is running (Linux/FreeBSD):
    strace -tt -o bogofilter.dump.$$ bogofilter OPTIONS
    Replace OPTIONS with your options; the output will be in
    files named bogofilter.dump.12345, bogofilter.dump.32463 and
    so on.

    Are you using "bogofilter -u"?

     
  • Fajar Nugraha

    Fajar Nugraha - 2003-03-21

    Logged In: YES
    user_id=715651

    I'm not using bogofilter -u. When I did, the machine just
    goes beserk (VERY HIGH LOAD) and I had to reboot it because
    it won't respond anymore.

    I don't know whether my system use mmap or not.
    It's a sun4u sparc SUNW,UltraAX-i2 running solaris 8 with
    Berkeley DB 4.1, local disk, bogofilter version 0.10.0.

    Daemon doesn't necessarily mean lots of data transfers. Anti
    virus daemons (eg. ClamAV) only pass filename on the socket,
    so it uses small amount of data transfer.

    Another thing. I tried replacing bogofilter with spamd
    (spamassassin), but the load is much higher, so I stop using
    it. It's not a surprise, however, since spamassassin is
    written in perl.

     
  • Matthias Andree

    Matthias Andree - 2003-03-21

    Logged In: YES
    user_id=2788

    bogofilter -u causes synchronous writes on the data base
    (which mean processes in I/O wait state, adding to the
    load), and you may have to limit the number of bogofilter
    processes running at the same time when you have a loaded
    mail system. It would be possible to make bogofilter do
    asynchronous writes, at the risk of much higher chance for
    data base corruption.

    Solaris + BDB 4.1 will do mmap(). This means all data will
    stay on disk until accessed, and the kernel will take care
    for caching the data. There will be virtually no copying
    data around (even if Sparcs are quite good at that).

    As to the daemon mode, are you familiar with profiling
    software? Getting gprof output might be useful to identify
    the places that limit performance. I should like to look at
    the figures to find out if we need to tune the lexer and
    parser or if it's really the data base. If it's the lexer,
    we can get along without adding a daemon mode (which adds a
    lot of complexity), if it's indeed the data base access that
    limits bogofilter performance, we'll have to do some
    research to figure how this can happen in a good way.

     
  • Fajar Nugraha

    Fajar Nugraha - 2003-03-26

    Logged In: YES
    user_id=715651

    Unfortunately I have never used profiling software before.
    Could you tell me how to do it?

     
  • Fajar Nugraha

    Fajar Nugraha - 2003-03-26

    Logged In: YES
    user_id=715651

    Soory, just read gprof manual earlier. Here's the result. I
    have NO IDEA how to read it though. Hope it's useful for you.

     
  • Fajar Nugraha

    Fajar Nugraha - 2003-03-26

    Gprof output for bogofilter

     
  • Matthias Andree

    Matthias Andree - 2003-03-29

    Logged In: YES
    user_id=2788

    Well, I can read it, but it does not contain useful
    information -- the reason is not that you did something
    wrong, but that the program has exited within 30 ms, and the
    profile information does not contain useful time values
    (only three single code samples have been made); and the
    function calls don't look bad or suspicious.

     
  • Fajar Nugraha

    Fajar Nugraha - 2003-04-01

    Logged In: YES
    user_id=715651

    I've been doing some more experiment with exim and
    bogofilter. It seems that no matter how efficient the filter
    is, exim will still use twice as much resource because
    bogofilter must run as transport filter and the message is
    rejected with system filter. Thus, for every message
    received exim must send mail to itself (to use the transport
    filter).

    I've been using exiscan + clamav for mail virus scanner,
    which works great. It has builtin support for spamd, which
    is not so great (perl, slower, higher resource demand). It
    will be great if there's a spamd-like interface to
    bogofilter, so I can use it with exiscan. There will be no
    need for exim to deliver mail to itself then.

    It would be easier if exiscan supports bogofilter.

     

Log in to post a comment.