NMon's "reference architecture&q...

Help
2013-04-27
2013-05-28
  • Denis Cerkvin
    Denis Cerkvin
    2013-04-27

    Hi All,

    I am trying to add some Infiniband related stuff into NMon, so I may monitor my hadoop clusters and other "computer sets".

    This program is probably one of the best and definitely most effective approaches to the cluster monitoring. All you need is this and a big screen, like LCD TV connected to your laptop.

    I am placing single terminal window per each cluster node, with some defaults fed via "NMON" Env.Var. - and it works as a charm, real-time, showing only what I really need (see my patches below).

    So, while going through the source code, trying to understand where to plug my IB stuff, I have built this document. Since the program comes without reasonable comments in it, I hoped this may be useful for anyone, who "tailors" NMon program for specific needs.

    Once again, great "Thank You!" goes to the developer of NMon Nigel Griffiths for clear and simple code, which compiles into very useful program :-)

    Thank you,
    File follows right here since I could not find the way to attach anything here.

    Ping me if you need file as-is.

    NMON Program's architecture.

    File: lmon14g_reat_DC.schema

    Produced based on the "lmon14g.c" version of the NMON program,
    with my previous patches already present in the source code.

    /* =================================================
    * Sincerely,
    * Denis Cerkvin
    *
    * The Bible for command line people.
    * http://www.read-and-think.org/kjv.html
    *
    * Библия для людей, работающих с командной строкой.
    * http://www.read-and-think.org/ 
    * =================================================
    */

    DEFs

    #define P_CPUINFO       0
    #define P_STAT          1
    #define P_VERSION       2
    #define P_MEMINFO       3
    #define P_UPTIME        4
    #define P_LOADAVG       5
    #define P_NFS           6
    #define P_NFSD          7
    #define P_VMSTAT        8 /* new in 13h */
    #define P_NUMBER        9 /* one more than the max */
    // CPU-related
    #define MAX_SNAPS 72
    #define MAX_SNAP_ROWS 20
    #define SNAP_OFFSET 6
    // Macros to access the previous and current versions of "database" records,
    // to calculate the diffs over the sleep time:
    #define DKDELTA(member) ( (q->dk_.member > p->dk.member) ? 0 : (p->dk.member - q->dk.member))
    #define SIDELTA(member) ( (q->si.member > p->si.member)       ? 0 : (p->si.member - q->si.member))
    … some more may be defined all over the code

    VARs

    // He declares variables and includes headers all over the code, as he goes.

    int proc_cpu_done = 0;  /* Flag if we have run function proc_cpu() already in this interval */
    time_t  timer;                  /* used to work out the hour/min/second */

    /* Counts of resources */
    int     cpus = 1;       /* number of CPUs in system (lets hope its more than zero!) */
    int old_cpus = 1;       /* Number of CPU seen in previuos interval */
    int     max_cpus = 1;   /* highest number of CPUs in DLPAR */
    int     networks = 0;   /* number of networks in system  */
    int     partitions = 0;         /* number of partitions in system  */
    int     partitions_short = 0;   /* partitions file data short form (i.e. data missing) */
    int     disks    = 0;   /* number of disks in system  */
    int     seconds  = -1;  /* pause interval */ Used for "sleep" call in the main program loop.
    int     maxloops = -1;  /* stop after this number of updates */
    char    hostname;
    char    run_name;
    int     run_name_set = 0;
    char    fullhostname;
    int     loop;

    char *easy - up to 4 strings with name of OS release

    /* Global name of programme for printing it */
    char    *progname;

    STRUCTs

    // Structure to read data out of individual files in /proc FS
    // One structure per file.
    struct {
            FILE *fp;
            char *filename;
            int lines;
            char *line;
            char *buf;
            int read_this_interval; /* track updates for each update to stop  double data collection */
    } proc; // i.e. 10, see above

    // Structure to keep records about individual OS process, something like "top" does.
    struct procsinfo
            int pi_pid;
            char pi_comm;

            unsigned long statm_lrs;        /* library */
            unsigned long statm_dt;         /* dirty pages */

    // Structure to keep each OS process arguments
    struct {
            int pid;
            char *args;
    } arglist;

    // These structures are used to keep the processed values, read from each /proc FS files.
    // There are 2 of each - past and new, to find the diffs.
    //
    /* Main data structureS for collected stats.
    * Two versions are previous and current data.
    * Often its the difference that is printed.
    * The pointers are swaped i.e. current becomes the previous
    * and the previous over written rather than moving data around.
    */
    struct cpu_stat - CPU
    struct dsk_stat - IO per disk
    struct mem_stat - memory
    struct vm_stat - virtual memory - i.e. paging, swapping ctxswtch etc.
    struct nfs_stat - NFS
    struct net_stat - IP network
    struct part_stat - disk partitions

    // This is "data" structure, there are 2 of them in array "database",
    // each structure is pointed by "p" or "q"
    struct data {
            struct dsk_stat *dk;
            struct cpu_stat cpu_total;
            struct cpu_stat cpuN;
            struct mem_stat mem;
            struct vm_stat vm;
            struct nfs_stat nfs;
            struct net_stat ifnets;
    #ifdef PARTITIONS
            struct part_stat parts;
    #endif /*PARTITIONS*/

            struct timeval tv;
            double time;
            struct procsinfo *procs;

            int    nprocs;
    } database, *p, *q; //q=previous and p=currect, modified ONLY via "switcher" procedure

    // For CPU utilization, taken at predefined "snapshots"
    struct {
            double user;
            double kernel;
            double iowait;
            double idle;
    } cpu_snap;

    PROCs

    // procedure "proc_init" sets ALL initial "proc" structures at once into:
    void proc_init()

            proc.filename = "/proc/cpuinfo";
            proc.filename    = "/proc/stat";
            proc.filename = "/proc/version";
            proc.filename = "/proc/meminfo";
            proc.filename  = "/proc/uptime";
            proc.filename = "/proc/loadavg";
            proc.filename     = "/proc/net/rpc/nfs";
            proc.filename    = "/proc/net/rpc/nfsd";
            proc.filename  = "/proc/vmstat";

    // procedure "proc_read" reads ONE file from /proc FS into proc.buf member,
    // if there is only one line - it points to "buf", otherwise it makes "line" array members
    // pointing to the places in "buf", where new line begins.
    void proc_read(int num) // "num" is one of 0..P_NUMBER defs

    - open file fp
    - rewind file to its start
    - read whole file into "buf"
    - find end of the lines and points "line"s to the next char - start of new line

      //Used by these procedures to read the data from the /proc FS:
        - int read_vmstat()
          - using "get_vm_value" via "GETVM" macro, updating "data" struct in "database" array's record, pointed by "p" pointer
        - void get_cpu_cnt()

    // These procedures "parse" the results in "proc" records
    void proc_cpu()
    void proc_nfs()
    void proc_kernel() - load averages, here is my fix for number of CPU cores
    // Disks statistics, this is really tricky part …
    void proc_disk(double elapsed) - chooses only 1 of 3 possible /proc FS files to use:
    - void proc_disk_io(double elapsed) - looks for line "disk_io" in "/proc/stat", which is not there
    - void proc_diskstats(double elapsed) - reads from file "/proc/diskstats", which exists and works.
    This file gets used on Fedora/RH. Here is my addition of excluded disks.
    - void proc_partitions(double elapsed) - gets names and sizes of disk partitions, hoping to get queue sizes etc.
    // Memory stuff, uses multiple procedures. Aware of Huge Pages.
    void proc_mem() - reads memory stats using:
    - long proc_mem_search( char *s) - looks for a string in "/proc/meminfo"
    int snap_average() - collects cumulative CPU usage over all snapshots made so far.
    - void snap_clear() - used to "reset" CPU snapshot counters.
    void proc_net() - reads "/proc/net/dev". Here is my addition to ignore IF names
    // OS processes stuff
    int getprocs(int details) - reads "/proc/*" entries with
    - int proc_procsinfo(int pid, int index) - reads "/proc/*/stat"

    // Procedure to "switch" the "database" from previous to current (new) snapshot,
    // re-reading all files from /proc FS.
    void switcher(void)

    // proc "find_release" - defines the OS release, populating "easy"
    void find_release()
    - read via the pipe the output of "cat /etc/*ease 2>/dev/null" command
    - populate up to 4 lines into "easy"

    // Procedures "args_*" do something with OS processes arguments
    void args_output() - processes "ps -p %d -o args 2>/dev/null" for each OS pid
    void args_load() - links pids and args via "ps -eo pid,args 2>/dev/null"
    char *args_lookup - finds arguments by pid

    // Procedure "linux_bbbp" reads output of any command as pipe!
    void   linux_bbbp

    // Service procedures:
    char *proc_find_sb(char * p) - finds space or left bracket
    void strip_spaces(char *s) - strip spaces from string
    #define isdigit(ch) - function wrapped into macro to find is parameter a digit
    int isnumbers(char *s)
    void interrupt(int signum) - intercepts interrupts SIGUSR1 or 2 (stop nmon cleanly) and SIGWINCH (window size)
    void child_start( - forking nmon's child, replacing its code with required command via "execlp"
    char *status(int n) - is process Run or Sleep
    char *get_state( char n) - process is Running/Sleeping/DiskSleep/Zombie/Traced/Paging
    void load_dgroup(struct dsk_stat *dk) - reads and creates entries in diskgroups. Very useful.
    void list_dgroup(struct dsk_stat *dk) - dumps diskgroup's stats into file (for Excel later on?)
    int checkinput(void) - reads "NMON" Env.Var and user keys during the program execution

    // NCURSES graphical plotting procedures
    void init_pairs() - sets colours etc.

    Plotting of CPU usage per snapshot, used with "L" key
    - void plot_snap(WINDOW *pad)
    - void plot_save(
    - void save_smp(
    - void plot_smp(

    // ========  MAIN ===============

    - Uses proc_init(); to set up the variables
    - reading command line args via "getopt"
    - going for the first time through each of "proc_*" parsing procedures
    - sets variable "seconds" to the value of command line parameter or 2 seconds default (line 3780)
    - Set the pointer ready for the first round, via "switcher" (line 3790)
    - populates the current "p" record in "database" array and copies some of those values into "q" too, to set names correctly etc.
    - makes another call to "switcher", setting records for the next round of data collection in main loop (line 3863)
    - initiates signal handlers
    - start Curses graphical UI

    - goes to the main endless loop (line 4138):
    - gets the current time into "time" field of current "p" version of "database" record
    - calculates sleep time as difference between "time" fields of "p" and "q"
    - struggles through the huge list of options, corresponding to user input letters and outputs results
    - with Curses, sets a loop where sleeps for a second (man 3 sleep), checks user pressing any chars,
      sleeps another second etc., up to the value of variable "seconds" or until user presses the key
    - does switcher() once again, swapping the pointers "q" and "p" and causing "/proc" to be re-read (see line 6111)
    - check if we have reached maximum number of loops and either exits or loops to the next iteration.

    // ====== End of the NMON program ========

    _