Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#4214 Kernel Panic - out of memory

0.82
closed-fixed
nobody
None
5
2013-09-02
2013-03-23
No

This morning one of my VPS servers crashed - kernel panic, out of memory - due to hundreds of perl interpreter processes spawning and not dying from webmin.

Centos 6.4 in Xen virtual environment, Virtualmin repository enabled and a recent update was applied.

Got lots of email to root.

Subject: Cron <root@webserver> /etc/webmin/status/monitor.pl
Message:
Error: Failed to lock file /etc/webmin/status/oldstatus after 5 minutes. Last error was :
Error
-----
Failed to lock file /etc/webmin/status/oldstatus after 5 minutes. Last error was :
-----

I haven't looked at the code but I suggest that if a fail to lock occurs after five minutes then the script should kill itself, maybe resetting the mutexed object after a couple of hours somehow, This would reduce the build up of perl processes waiting for a lock which is blocking for some reason.

Cheers,
Chris

Discussion

  • Jamie Cameron
    Jamie Cameron
    2013-03-23

    How often do you have scheduled monitoring setup to run? If it is more often than once per every 5 minutes, I can see how this kind of process build up could happen.

    The real question is why /etc/webmin/status/oldstatus could not be locked. The file /etc/webmin/status/oldstatus.lock should contain the PID of the process that was holding the lock, although if your system has been rebooted that won't be very useful anymore.

     
  • I had a look through the cron log ... lots of
    INFO (Job execution of per-minute job scheduled for 00:27 delayed into subsequent minute 00:28. Skipping job run.)
    ERROR (setreuid failed): Resource temporarily unavailable
    messages there.

    The scheduled monitoring period is set as 5 minutes on my system which I'm assuming is a default on installation as I can't recall ever changing it. I'll up it to twenty minutes, so at least if the error occurs again I should have more time to
    catch the behaviour and be able to find what's holding the lock. Because the kernel panicked and we couldn't get a console session, I couldn't investigate without a reboot when we found the issue.

     
  • I had the same error starting last night at 00:40 (webmin 1.65)
    Oldstatus.lock contains 10577 - it's the PID of
    /usr/bin/perl /usr/libexec/webmin-1.650/status/monitor.pl running under root.
    The status monitor appears to be working OK, however the process at 10577 is basically stuck, (started at 00:30).
    When status monitor runs there are 2 processes, hence the warning.

    I have seen this before, can't remember what cleared it, perhaps restarting webmin.
    Any further diagnosis you need?

     
  • Jamie Cameron
    Jamie Cameron
    2013-08-23

    Richard - did you also see the same problem of monitor.pl using up all the memory on your system?

     
  • No, only the one stuck process, all the others terminate OK, so no significant memory use.
    FWIW One significant thing I was doing at the time the monitor got stuck was taking my Internet connection up and down (I have a ping monitor for the WAN address which would have been changing - its ADSL dynamic IP)

     
  • Update - the same happened again this evening when the broadband was going up and down by itself. Seems even more like it's the ping monitor that gets stuck

     
  • Jamie Cameron
    Jamie Cameron
    2013-09-02

    Did you see the same OOM errors again when your broadband went down? Or did you just see locking errors from monitor.pl ?

     
  • No, just a stuck monitor process as before

     
  • Jamie Cameron
    Jamie Cameron
    2013-09-02

    • status: open --> closed-fixed
    • Group: --> 0.82
     
  • Jamie Cameron
    Jamie Cameron
    2013-09-02

    OK, it looks like the issue is that in some cases ping never terminates, even when run with the -w flag. I will add an additional timeout enforced by Webmin in the next release to handle this case.