From: Nils B. (Lemonbit) <ni...@le...> - 2007-10-17 14:35:54
Attachments:
PGP.sig
|
Hello, I have fail2ban running on a couple of CentOS machines and that's all running fine. I also have fail2ban running on a Debian 4.0 machine using the fail2ban_0.8.1-1_all.deb package. On this machine top shows fail2ban-server running continuously at ~5% cpu. I used strace to see what it is doing. Running strace on the pid of the fail2ban-server process for a second or so gives me this output: ---- # strace -p 4627 Process 4627 attached - interrupt to quit select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631586, 795982}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631586, 847894}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631586, 904211}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 23416}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631586, 927855}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 futex(0x81f6590, FUTEX_WAKE, 1) = 0 gettimeofday({1192631586, 928505}, NULL) = 0 futex(0x81dc730, FUTEX_WAKE, 1) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631586, 929000}, NULL) = 0 gettimeofday({1192631586, 929153}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 futex(0x81dc730, FUTEX_WAKE, 1) = 0 gettimeofday({1192631586, 929783}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631586, 930223}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631586, 935830}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631586, 943200}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631586, 945020}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631586, 951886}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631586, 969047}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631586, 999835}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631587, 54535}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631587, 108090}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631587, 159846}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631587, 211847}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631587, 267845}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631587, 319920}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631587, 371871}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631587, 423849}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631587, 475852}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631587, 527945}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout) futex(0x81d2a40, FUTEX_WAKE, 1) = 0 gettimeofday({1192631587, 579879}, NULL) = 0 futex(0x81d2a40, FUTEX_WAKE, 1) = 0 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...> Process 4627 detached ---- Fail2ban seems to be very busy doing this constantly really. Anyone have an idea what might be the problem? Nils Breunese. |
From: Yaroslav H. <li...@on...> - 2007-10-18 04:09:34
|
on Debian boxes by default it uses pooling mechanism (instead of python-gamin) since when the feature of relying on gaming was implemented by Cyril, debian's python-gamin wasn't performing as it was supposed for some reason... I should double check now if that is the case even these days... pooling checks each file it monitors each second. Might it be that you monitor way too many files (may be bad patter of files names is given or smth like that)? what does it say if verbosity is set to debug level? back to your situation: how many jails on how many files are you running fail2ban? I am running it with 9 jails and watching even more files (webservers have 1 log per server) and I never see it in top... On Wed, 17 Oct 2007, Nils Breunese (Lemonbit) wrote: > Hello, > I have fail2ban running on a couple of CentOS machines and that's all running fine. I also have fail2ban > running on a Debian 4.0 machine using the fail2ban_0.8.1-1_all.deb package. On this machine top shows > fail2ban-server running continuously at ~5% cpu. I used strace to see what it is doing. Running strace on the > pid of the fail2ban-server process for a second or so gives me this output: -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555] |
From: Nils B. (Lemonbit) <ni...@le...> - 2007-10-18 11:06:37
Attachments:
PGP.sig
|
Yaroslav Halchenko wrote: > on Debian boxes by default it uses pooling mechanism (instead of > python-gamin) since when the feature of relying on gaming was > implemented by Cyril, debian's python-gamin wasn't performing as it > was > supposed for some reason... I should double check now if that is the > case even these days... pooling checks each file it monitors each > second. Might it be that you monitor way too many files (may be bad > patter of files names is given or smth like that)? what does it say if > verbosity is set to debug level? I set loglevel = 4 in fail2ban.conf and I noticed this in the log after restarting fail2ban: fail2ban.filter : DEBUG Date 0 is smaller than 1192595290.0 That seems strange to me somehow, but I have no idea where to further investigate this. Then I installed python-gamin and created /etc/fail2ban/jail.local with the following: ---- [DEFAULT] backend = gamin ---- After restarting fail2ban I still see fail2ban-server running at 3-4% cpu. This is a simple office server (AMD Athlon 1 GHz, 256 MB RAM) running Samba, apache for internal use and OpenVPN. Load is never over 0.4 and there are just a handful of users. Does this kind of cpu load seem unusual to you? > back to your situation: how many jails on how many files are you > running > fail2ban? > I am running it with 9 jails and watching even more files (webservers > have 1 log per server) and I never see it in top... It's running with only the defaults, so only checking ssh and monitoring /var/log/auth.log. That's more reason why I thought 5% cpu usage was a little high. Nils Breunese. |
From: Yaroslav H. <li...@on...> - 2007-10-18 20:25:30
|
> I set loglevel = 4 in fail2ban.conf and I noticed this in the log after restarting fail2ban: > fail2ban.filter : DEBUG Date 0 is smaller than 1192595290.0 That is normal if you get such a message once per file at the beginning since the file was never checked before during the lifespan of a given fail2ban-server process ;-) BUT if keep getting such messages - then smth is weird and previousely it boiled down to some desync in time reported by the daemon and system time. > That seems strange to me somehow, but I have no idea where to further investigate this. > are just a handful of users. Does this kind of cpu load seem unusual to you? yeah... it is too high... but what do you have in your fail2ban.log after you have restarted the server with verbosity debug -- could you simply email it? -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555] |
From: Nils B. (Lemonbit) <ni...@le...> - 2007-10-18 21:33:19
Attachments:
PGP.sig
|
Yaroslav Halchenko wrote: >> I set loglevel = 4 in fail2ban.conf and I noticed this in the log >> after restarting fail2ban: >> fail2ban.filter : DEBUG Date 0 is smaller than 1192595290.0 > That is normal if you get such a message once per file at the > beginning > since the file was never checked before during the lifespan of a given > fail2ban-server process ;-) > > BUT if keep getting such messages - then smth is weird and previousely > it boiled down to some desync in time reported by the daemon and > system > time. No, I don't keep getting those messages. The log looks pretty normal otherwise. >> That seems strange to me somehow, but I have no idea where to >> further investigate this. >> are just a handful of users. Does this kind of cpu load seem >> unusual to you? > yeah... it is too high... > > but what do you have in your fail2ban.log after you have restarted the > server with verbosity debug -- could you simply email it? I will send you the log file. Nils. |
From: Yaroslav H. <li...@on...> - 2007-10-18 22:22:42
|
nothing is there... lets elaborate a bit your strace use -ttt -T -fF parameters on top and dump to a file (-o). This way we have more information on timing and about all threads. On Thu, 18 Oct 2007, Nils Breunese (Lemonbit) wrote: > Yaroslav Halchenko wrote: > >BUT if keep getting such messages - then smth is weird and previousely > >it boiled down to some desync in time reported by the daemon and system > >time. > No, I don't keep getting those messages. The log looks pretty normal otherwise. -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555] |
From: Nils B. (Lemonbit) <ni...@le...> - 2008-01-07 10:03:31
|
Yaroslav Halchenko wrote: > nothing is there... lets elaborate a bit your strace > use > -ttt -T -fF > parameters on top and dump to a file (-o). This way we have more > information on timing and about all threads. I upgraded from 0.8.1-1 to 0.8.1-3 (http://packages.debian.org/unstable/net/fail2ban ) and that solved the high cpu usage. Nils Breunese. |
From: Yaroslav H. <li...@on...> - 2008-01-07 18:29:09
|
Great to hear! I guess changes Cyril had done in 0.9 (and which I propagated into 0.8.1) ,-- | * Propagated patch from 0.9 upstream branch: "Replaced ssocket.py with | asyncore/asynchat implementation. Correct fix for bug #1769616. | That is supposed to resolve spontaneous 100% CPU utilization by | fail2ban-server." `--- Sad that there is no debian bug which I could close with that ;-))) On Mon, 07 Jan 2008, Nils Breunese (Lemonbit) wrote: > Yaroslav Halchenko wrote: > > nothing is there... lets elaborate a bit your strace > > use > > -ttt -T -fF > > parameters on top and dump to a file (-o). This way we have more > > information on timing and about all threads. > I upgraded from 0.8.1-1 to 0.8.1-3 (http://packages.debian.org/unstable/net/fail2ban > ) and that solved the high cpu usage. > Nils Breunese. > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Fail2ban-users mailing list > Fai...@li... > https://lists.sourceforge.net/lists/listinfo/fail2ban-users -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555] |
From: Cyril J. <cyr...@fa...> - 2008-01-08 00:22:12
|
Hi all, > > Great to hear! I guess changes Cyril had done in 0.9 (and which I > > propagated into 0.8.1) > > > > ,-- > > | * Propagated patch from 0.9 upstream branch: "Replaced ssocket.py with > > | asyncore/asynchat implementation. Correct fix for bug #1769616. > > | That is supposed to resolve spontaneous 100% CPU utilization by > > | fail2ban-server." > > `--- > > Good :) The fix will be in 0.8.2 too. > > Sad that there is no debian bug which I could close with that ;-))) > > Ehehe ;) Cheers, Cyril |
From: Cyril J. <cyr...@fa...> - 2007-10-22 18:43:12
|
Hi Nils, Sorry for the late answer :( > Fail2ban seems to be very busy doing this constantly really. Anyone have > an idea what might be the problem? > What kind of architecture is it? I suspect a problem with the socket server. Could you try a nightly build? http://www.fail2ban.org/nightly/fail2ban-trunk.tar.bz2 Just stop fail2ban, unpack the archive somewhere, cd into the directory and start fail2ban with: # ./fail2ban-client -c /etc/fail2ban start I completely rewrote the communication part of the server in 0.9. Cheers, Cyril Jaquier |