#240 check_disk reports incorrect disk free with neg space on BSD

Release (specify)
closed-fixed
nobody
5
2007-12-08
2005-11-05
Ted Cabeen
No

With check_disk running on FreeBSD 5-STABLE, when a
disk has negative free space remaining, the amount of
free space goes hugely positive:
DISK CRITICAL - free space: /usr 36028797018963968 MB
(1191472380510208%):

Here's a df from the time:
/dev/ad4s1g 3096462 2989082 -140336 105%
/usr

Discussion

  • M. Sean Finney
    M. Sean Finney
    2005-11-07

    Logged In: YES
    user_id=226838

    hi,

    um, i just have to ask. how do you have negative free space?

    some other information that would be helpful:
    - is check_disk using the df command or internal disk space
    routines?
    - if df, what df command syntax is check_disk using?
    - what version of the plugins are you using?

    i believe that the plugin is making an assumption that the
    amount of disk space available is unsigned, because, er...
    well i'd never heard of negative disk space, anyway :)

     
  • Ted Cabeen
    Ted Cabeen
    2005-11-07

    Logged In: YES
    user_id=40466

    All modern unix file-systems reserve a portion (5-10%) of
    the disk space for use by root only and to speed disk
    accesses. If the root user exceeds the normal disk space
    and uses some of the reserve space, the system will
    represent the amount of free space as negative.

    I don't know how check_disk is checking the disk space (df
    or internal routines). Is there an easy way to check?

    check_disk (nagios-plugins 1.4.2) 1.57 is the version I'm
    running.

     
  • M. Sean Finney
    M. Sean Finney
    2005-11-08

    • assigned_to: nobody --> seanius
     
  • M. Sean Finney
    M. Sean Finney
    2005-11-08

    Logged In: YES
    user_id=226838

    hi,

    well chalk this up to my having been away from traditional
    unix/bsd implementations. afaict in linux such reserved
    space is still taken into
    calculation of total available space (ie, you could get
    ENOSPC before the disk reached 0%).

    but anyway, i think the fix is still obvious, that we should
    do all scans and assignments as signed integers instead of
    unsigned. if i don't hear any complaints from anyone else
    on the plugins team, i'll probably do this at some point
    (and hope that it doesn't break
    something else)

    also, having taken a look at the check_disk code, i can't
    seem to find any references to the df program... so i guess
    if you're using 1.4.2 or later that it's purely within the
    internal disk space routines.

     
  • Ton Voon
    Ton Voon
    2005-11-08

    Logged In: YES
    user_id=664364

    From 1.4 onwards, we use the GNU coreutils library to get df data. I don't
    know if FreeBSD use their own routines or not, but GNU coreutils should
    support it.

    Yes, I guess signed integers should fix. Was an assumption on our part
    that values would be always positive.

     
  • Ton Voon
    Ton Voon
    2006-07-19

    Logged In: YES
    user_id=664364

    Ted,

    Can you try the latest snapshot at http://nagiosplug.sf.net/snapshot. There
    have been major changes to check_disk to sync it with coreutils' df so there
    shouldn't be sign problems.

    If you still have problems, can you tell me what version of df are you using?

    Ton

     
  • Ton Voon
    Ton Voon
    2006-07-19

    • assigned_to: seanius --> tonvoon
     
  • Ton Voon
    Ton Voon
    2006-10-19

    • status: open --> pending
     
  • Ton Voon
    Ton Voon
    2006-10-19

    Logged In: YES
    user_id=664364

    Doesn't look like any updates since I last requested in July. Marking call into
    pending.

    Ton

     
  • Logged In: YES
    user_id=1312539

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 14 days (the time period specified by
    the administrator of this Tracker).

     
    • status: pending --> closed
     
  • Frank Altpeter
    Frank Altpeter
    2007-11-20

    Logged In: YES
    user_id=145970
    Originator: NO

    I would like this bug to have reopened. It still exists in nagios-plugins version 1.4.10 at least at FreeBSD 6.2-RELEASE-p5, as the following test shows:

    Filesystem Size Used Avail Capacity Mounted on
    /dev/amrd0s1e 496M 461M -4.5M 101% /tmp

    # /usr/local/libexec/nagios/check_disk /tmp
    DISK OK - free space: /tmp 17592186044411 MB (-1% inode=99%);| /tmp=460MB;;;0;495

    Would be great to have a fix soon - this is quite bad since i cannot trust check_disk anymore with that...

     
  • Holger Weiß
    Holger Weiß
    2007-11-20

    • assigned_to: tonvoon --> nobody
    • status: closed --> open
     
  • Holger Weiß
    Holger Weiß
    2007-11-20

    Logged In: YES
    user_id=759506
    Originator: NO

    Thanks, we'll have to look into it.

     
  • Frank Altpeter
    Frank Altpeter
    2007-11-20

    Logged In: YES
    user_id=145970
    Originator: NO

    A little more input, because just hinted from #nagios:

    # df -h /tmp
    Filesystem Size Used Avail Capacity Mounted on
    /dev/amrd0s1e 496M 461M -4.5M 101% /tmp

    # check_disk -vvv /tmp | head -1
    For /tmp, used_pct=101 free_pct=-1 used_units=460 free_units=1.75922e+13 total_units=495 used_inodes_pct=1 free_inodes_pct=99 fsp.fsu_blocksize=2048 mult=1048576

     
  • Frank Altpeter
    Frank Altpeter
    2007-11-21

    Logged In: YES
    user_id=145970
    Originator: NO

    Hmmm, just detected one more problem with check_disk processing on FreeBSD:

    root@canismajor:~ # /usr/local/libexec/nagios/check_disk -w 10% -c 5% -X devfs -X procfs -X linprocfs -X tmpfs -X union /var
    DISK WARNING - free space: /var 498 MB (5% inode=97%);| /var=9419MB;8924;9420;97;9916

    root@canismajor:~ # /usr/local/libexec/nagios/check_disk -w 10% -c 5% -X devfs -X procfs -X linprocfs -X tmpfs -X union -vvv
    DISK OK - free space: / 264 MB (53% inode=92%); /tmp 177 MB (36% inode=90%); /usr 6458 MB (65% inode=83%); /var 496 MB (5% inode=97%); /var/spool 31789 MB (69% inode=95%); /var/spool/mail 72962 MB (54% inode=87%);
    264 of 496 MB (53% inode=92%) free on /dev/amrd0s1a (type ufs mounted on /) warn:0 crit:0 warn%:10% crit%:5%
    177 of 496 MB (36% inode=90%) free on /dev/amrd0s1d (type ufs mounted on /tmp) warn:0 crit:0 warn%:0% crit%:0%
    6458 of 9916 MB (65% inode=83%) free on /dev/amrd0s1f (type ufs mounted on /usr) warn:0 crit:0 warn%:0% crit%:0%
    496 of 9916 MB (5% inode=97%) free on /dev/amrd0s1e (type ufs mounted on /var) warn:0 crit:0 warn%:0% crit%:0%
    31789 of 46096 MB (69% inode=95%) free on /dev/amrd0s1g (type ufs mounted on /var/spool) warn:0 crit:0 warn%:0% crit%:0%
    72962 of 135854 MB (54% inode=87%) free on /dev/amrd1s1d (type ufs mounted on /var/spool/mail) warn:0 crit:0 warn%:0% crit%:0%| /=231MB;445;470;92;495 /tmp=318MB;495;495;90;495 /usr=3459MB;9916;9916;83;9916 /var=9420MB;9916;9916;97;9916 /var/spool=14307MB;46096;46096;94;46096 /var/spool/mail=62891MB;135853;135853;87;135853

    e.g. when checking a mount point directly, the insufficient space gives a warning, but when checked as a summary, the state is OK because it looks like that the warning and critical criteria are only used for the first found mount point ...

     
  • Frank Altpeter
    Frank Altpeter
    2007-11-30

    Logged In: YES
    user_id=145970
    Originator: NO

    Are there any efforts on this topic yet? Can i help somehow in finding out the reason for this? Me, personally, thinks that this should go on a somewhat high priority because this bug makes check_disk more or less untrustable ...

     
  • Frank Altpeter
    Frank Altpeter
    2007-11-30

    Logged In: YES
    user_id=145970
    Originator: NO

    Are there any efforts on this topic yet? Can i help somehow in finding out the reason for this? Me, personally, thinks that this should go on a somewhat high priority because this bug makes check_disk more or less untrustable ...

     
  • Logged In: YES
    user_id=375623
    Originator: NO

    Sorry about that. A few of us looked into it a while back and couldn't find the issue. I can take a second look, but it would help if you first patch check_disk with the attached check_disk.extra-debug.patch and send me the full output after running the plugin with -vvv (Please limit it to one path if possible).

    Since I don't have a BSD system to test with the attached patch will give me what I need to simulate your system and hopefully reproduce the bug.
    File Added: check_disk.extra-debug.patch

     
  • Matthias Eble
    Matthias Eble
    2007-12-03

    Logged In: YES
    user_id=1694341
    Originator: NO

    Hello Altpeter,

    I'm pretty sure, the threshold/argument problem doesn't exist in the latest versions including 1.4.10.
    The debug output of your command line is commented out in the current code.

    negative freespace:
    I currently can't imagine where the negative values come from but IMO they shouldn't be there.
    However, I'll try to find some time to test.

    Matthias

     
  • Logged In: YES
    user_id=375623
    Originator: NO

    This is fixed in SVN. The root cause of the problem is in Gnulib which is why it was so hard to track this problem; I implemented a simple workaround in check_disk. The credits should go to Matthias as he was kind enough to upload me a FreeBSD VM to test on.

    You can try on the latest SVN HEAD (which will likely be released next week) or use the next daily snapshot.

    To get the HEAD:

    $ svn co http://nagiosplug.svn.sourceforge.net/svnroot/nagiosplug/nagiosplug/trunk/ nagiosplug

    Snapshots are there (Make sure it's at least Dec. 9 2007):
    http://nagiosplug.sourceforge.net/snapshot/

     
    • status: open --> closed-fixed
     
  • Frank Altpeter
    Frank Altpeter
    2007-12-19

    Logged In: YES
    user_id=145970
    Originator: NO

    Sorry for not having time to look into this yet, but too much work at the office... :)

    I just checked out nagios-plugins in version 1.4.11 and tested again, and so far, it seems to count correctly now.