Menu

#2658 HOST-RESOURCES-MIB::hrSystemNumUsers.0 returns wrong number of users

freeBSD
open
nobody
None
5
2015-10-16
2015-08-14
No

On FreeBSD I'm seeing the wrong number of users for HOST-RESOURCES-MIB::hrSystemNumUsers.0 with snmp v 5.7.3

$ snmpwalk -c MinionComm -v 1 127.0.0.1 HOST-RESOURCES-MIB::hrSystemNumUsers.0
HOST-RESOURCES-MIB::hrSystemNumUsers.0 = Gauge32: 4

4 users is wrong. There is only one user on this system:

$ uptime
8:54PM up 2 days, 3:49, 1 user, load averages: 0.99, 1.07, 1.07

I have tried this on both FreeBSD 9.3 and FreeBSD 10.1 with the same results.

ref: https://forums.freebsd.org/threads/snmpd-hrsystemnumusers-is-wrong.52785/

Discussion

  • Dan Langille

    Dan Langille - 2015-08-15

    FYI, the value returned appears to go up, but never goes back down.

     
  • Bill Fenner

    Bill Fenner - 2015-08-17

    Hi Dan,

    The code that generates the number of users is at https://sourceforge.net/p/net-snmp/code/ci/master/tree/agent/mibgroup/host/hr_system.c#l665 . It has a pretty naive loop around getutent() (or getutxent() if you have utmpx), and checks for USER_PROCESS (if utmp_p has ut_type) and that the PID is running (if utmp_p has ut_pid).

    Sadly I don't currently have a good FreeBSD machine to test this on; is getutent() / getutxent() not the right way to count users?

    Bill

     
  • Dan Langille

    Dan Langille - 2015-08-17

    I chatted with Ed Schouten about w. He suggested I try this from the command line:

    $ getent utmpx active
    [1439589750.958402 -- Fri Aug 14 22:02:30 2015] system boot
    [1439844386.475877 -- Mon Aug 17 20:46:26 2015] user process: id="7074732f30000000" pid="80395" user="dan" line="pts/0" host="dent.int.unixathome.org"
    [1439645436.408339 -- Sat Aug 15 13:30:36 2015] dead process: id="7074732f31000000" pid="7717"

    Does that help you?

    He also mentioned, and I paraphrase: It might be easier to #ifdef USER_PROCESS instead of #ifndef UTMP_HAS_NO_TYPE because maybe the current #ifdef only tests whether utmp::ut_type is present -- not utmpx::ut_type. Also, the kill loops is a bit suspect, because if someone forgets to set ut_pid, net-snmp will garbage collect them even though it should leave them alone. I don't know of a single system that does implement getutent() and friends, but does not implement getutxent(). So it might be worth patching it up to use utmpx exclusively.

     

    Last edit: Dan Langille 2015-08-17
    • Bill Fenner

      Bill Fenner - 2015-08-18

      UTMP_HAS_NO_TYPE is left over from the pre-autoconf configuration system. It's pretty awesome:

      freebsd10.h:#include "freebsd9.h"
      freebsd2.h:#include "freebsd.h"
      freebsd3.h:#include "freebsd.h"
      freebsd4.h:#include "freebsd.h"
      freebsd5.h:#include "freebsd4.h"
      freebsd6.h:#include "freebsd5.h"
      freebsd7.h:#include "freebsd6.h"
      freebsd8.h:#include "freebsd7.h"
      freebsd9.h:#include "freebsd8.h"
      freebsd.h:#include "bsd.h"
      

      and bsd.h, of course, has:

      #define UTMP_HAS_NO_TYPE 1
      #define UTMP_HAS_NO_PID 1
      

      since that was probably true of 4.2BSD.

      So, your problem is that net-snmp is ignoring the ut_type field.

      I agree that net-snmp shouldn't be updating utmp based on killing the process, but that's compiled out on FreeBSD since it also "has no pid" there ;-)

      Bill

       
      • Dan Langille

        Dan Langille - 2015-08-18

        Ahh, yes, the #ifdef lines 682-686.

        are you suggesting this patch:

        ```shell
        @@ -16,5 +16,5 @@
        */
        #define ARP_SCAN_FOUR_ARGUMENTS 1

        -#define UTMP_HAS_NO_TYPE 1
        +#define UTMP_HAS_NO_TYPE 0
        #define UTMP_HAS_NO_PID 1
        ```

         
        • Bill Fenner

          Bill Fenner - 2015-08-18

          It's #ifdef, not #if, so the right fix is to #undef UTMP_HAS_NO_TYPE in freebsdN.h after the include of bsd.h, where N is the version in which ut_type was introduced. (Or the most recent version that anyone shuold be allowed to care about, e.g., I'd be fine with N=8)

          When I try to think about the real fix I tend to think about autoconf'ing the world and then my brain starts to hurt :-)

           
          • Dan Langille

            Dan Langille - 2015-08-20

            What testing can I do on my systems to help?

             
            • Bill Fenner

              Bill Fenner - 2015-08-25

              Can you rebuild with

              #undef UTMP_HAS_NO_TYPE
              

              in freebsd8.h (after it #includes freebsd7.h) and test the value reported by the daemon?

               
              • Dan Langille

                Dan Langille - 2015-08-25

                Before installing patched snmpd:

                $ snmpwalk -v3 -l authPriv -u rodvl -a SHA -A AuthPass -x AES -X PrivPass varm.int.unixathome.org HOST-RESOURCES-MIB::hrSystemNumUsers.0
                HOST-RESOURCES-MIB::hrSystemNumUsers.0 = Gauge32: 4

                After installing patched snmpd:

                $ snmpwalk -v3 -l authPriv -u rodvl -a SHA -A AuthPass -x AES -X PrivPass varm.int.unixathome.org HOST-RESOURCES-MIB::hrSystemNumUsers
                HOST-RESOURCES-MIB::hrSystemNumUsers.0 = Gauge32: 1

                After logging out of the target machine:

                $ snmpwalk -v3 -l authPriv -u rodvl -a SHA -A AuthPass -x AES -X PrivPass varm.int.unixathome.org HOST-RESOURCES-MIB::hrSystemNumUsers
                HOST-RESOURCES-MIB::hrSystemNumUsers.0 = Gauge32: 0

                 
              • Dan Langille

                Dan Langille - 2015-08-25
                 
              • Dan Langille

                Dan Langille - 2015-08-25

                The proof is in the pudding.

                 

                Last edit: Dan Langille 2015-08-25
  • Bill Fenner

    Bill Fenner - 2015-09-11

    I've checked in a patch to the 5.7 and master branches. Would you mind pulling HEAD from one of these and building to make sure things are OK?

    I've #undef'd both NO_TYPE and NO_PID, but I've also removed the code that writes to utmp - it just skips PIDs that we can't kill.

     
  • Dan Langille

    Dan Langille - 2015-10-16

    I'm going to try this now.

     
    • Dan Langille

      Dan Langille - 2015-10-16

      My plan was to find the commit, apply it to my local tree, and try it out. But I can't find the commit, or where to download a branch....

       
  • Dan Langille

    Dan Langille - 2015-10-16

    All seems well, just a few minutes after upgrading, the correct number of users is being displayed.

    Bill: to be sure I applied the correct patch,

    I also think we need check the other values, which seem high to me:

    context switches 7.5 M
    interrupts 7.5 M
    swap I/O activity 40G

    see graphs at https://twitter.com/DLangille/status/655152494671216640

     
  • Dan Langille

    Dan Langille - 2015-10-16
     
    • Bill Fenner

      Bill Fenner - 2015-10-19

      Ugh? checking size of short... Segmentation fault (core dumped). Any way to access the config.log from that run?

       
      • Dan Langille

        Dan Langille - 2015-10-19

        Attached...

         

Log in to post a comment.