Menu

crashinfo

Alex Sidorenko

Crashinfo

crashinfo is a tool for gathering information directly on customer's host and to do some simple automated 1st-pass analysis. Especially is it useful for hangs or performance problems.

Using crashinfo - default mode

By default, when crashinfo is invoked without any options, it prints some vital statistics and runs a number of tests. As a result of running these tests, it can print WARNING messages. These messages are usually (but not always) followed by more details. For example:

+++WARNING+++ fragmentation: 32Kb

   == Zone:Normal ==

AREA    SIZE  FREE_AREA_STRUCT     BLOCKS  PAGES
----    ----  ------------------   ------  -----
  0       4k  0x00000000c03ae0d8     8071   8071
  1       8k  0x00000000c03ae0e4     1584   3168
  2      16k  0x00000000c03ae0f0      157    628
  3      32k  0x00000000c03ae0fc        0      0
  4      64k  0x00000000c03ae108        0      0
  5     128k  0x00000000c03ae114        0      0
  6     256k  0x00000000c03ae120        0      0
  7     512k  0x00000000c03ae12c        0      0
  8    1024k  0x00000000c03ae138        0      0
  9    2048k  0x00000000c03ae144        0      0
 10    4096k  0x00000000c03ae150        0      0
+++WARNING+++ there are outstanding blk_dev requests
    <struct request 0xf7fb9900>, rq_status=0x1, rq_dev=0x6802 major=104 minor=2,
        started     23.4 s ago

You can search for +++WARNING+++ pattern to be sure you did not miss any warning.

Some tests rely on builtin crash commands that can potentially run for a long time. For example, to check for fragmentation we do 'kmem -f'. On one of vmcores it took more than 20 minutes to complete, even though the host was fast (DL145). As in many cases we would like to obtain at least partial results fast, there is a timeout mechanism for such commands. By default it is 2 min/command. If we timeout, a warning message is printed:

+++WARNING+++ <foreach bt> failed to complete within the timeout period of 120s

In this case you can increase the timeout by using an option '--timeout=seconds'. If you are in a hurry, you can use '--fast' option. This sets timeout to 15s and disables some potentially slow tests.

The tests check many conditions that are important when system seems to be hung: load averages, memory fragmentation, outstanding block requests, spinlocks hold, activity on networking cards and so on. Some tests are heuristic - see advanced options for more details. New tests are added all the time, based on practical experience.

You can decrease verbosity by using '-q' (quiet) option, in this case only warnings are printed. Using '-v' (verbose) makes crashinfo print additonal information.

Advanced Options

In additon to default mode, crashinfo supports a number of options that facilitate the troubleshooting. You can see them all by running

crash64> xportshow -h
Usage: xportshow.py [options]

Options:
  -h, --help            show this help message and exit
  -a                    print all sockets
  -v                    verbose output
  -r                    Print routing table. Adding -v prints all routing
                        tables and policies
  --program=PROGRAM     print sockets for cmdname
  --pid=PID             print sockets for PID
  --netfilter           Print Netfilter Hooks
  --softnet             Print Softnet Queues
  --summary             Print A Summary
  -s, --statistics      Print Statistics
  -i                    Print Interface Info
  --interface=IF1       Limit output to the specified interface only
  --decode=DECODE       Decode iph/th/uh
  --port=PORT           Limit output to the specified port (src or dst)
  -l, --listening       Print LISTEN sockets only
  -t, --tcp             Print TCP Info
  -u, --udp             Print UDP Info
  -w, --raw             Print RAW Info
  -x, --unix            Print UNIX Info
  --sysctl              Print sysctl info for net.
  --devpack             Print dev_pack info
  --arp                 Print ARP & Neighbouring info
  --rtcache             Print the routing cache
  --skbuffhead=SKBUFFHEAD
                        Print sk_buff_head
  --version             Print program version and exit
  --everything          Run all functions available for regression testing

  Experimental Options:
    Caution: this is work in progress, not fully supported for all kernels
    yet.

    --sport=SPORT       Limit output to the specified sport
    --dport=DPORT       Limit output to the specified dport
    --ipsec             Print IPSEC stuff
    --profile           Run with profiler

 ** Execution took   2.18s (real)   1.93s (CPU)

Stack Summaries

Quite often we need to better understand what kind of load do we have: what are the name of commands, how many threads are running and what they are doing. A traditional approach is to use 'foreach bt' and then browse it or write AWK-scripts to reformat the output. Two options let us somewhat facilitate the process:

  --stacksummary        Print stacks (bt) categorized summary.
  --findstacks=FINDSTACKS
                        Print stacks (bt) containing functions that match the
                        provided pattern

In both cases the stacks are printed in a compact way, skipping some details.
'--stacksummary' prints stacks that look similar grouped together. When deciding whether two stacks look similar, we do not take into account the offsets, only the names and order of functions. The stacks are reverse-sorted by how many times they occur, so that those stacks that occur many times are output first. For example:

------- 74 stacks like that: ----------
  #0   schedule
  #1   schedule_timeout
  #2   futex_wait
  #3   do_futex
  #4   sys_futex
  #5   system_call
    youngest=23s(pid=4482), oldest=82577s(pid=3539)

   ........................
     hpasmd                         3 times
     hpsmhd                         25 times
     httpd                          10 times
     java                           34 times
     twagent                        2 times

------- 41 stacks like that: ----------
  #0   schedule
  #1   sys_rt_sigsuspend
  #2   system_call
    youngest=23s(pid=17937), oldest=82550s(pid=4267)

   ........................
     java                           24 times
     opmn                           6 times
     rwserver                       9 times
     vcagentd                       2 times

Adding '-v' prints lists of PIDs for each stack

'--findstacks' searches for those stacks that contain a given pattern (patterns can be regular expressions). For example:

crash32> crashinfo --find failed
PID=36  CMD=qla2300_dpc0
  #0   context_switch+0xa4
  #1   schedule+0x2ef
  #2   __down_interruptible+0x85
  #3   __down_failed_interruptible+0x2
  #4   .text.lock.KBUILD_BASENAME+0xa2f09 , (via qla2x00_do_dpc)
  #5   kernel_thread_helper+0x3

PID=37  CMD=qla2300_dpc1
  #0   context_switch+0xa4
  #1   schedule+0x2ef
  #2   __down_interruptible+0x85
  #3   __down_failed_interruptible+0x2
  #4   .text.lock.KBUILD_BASENAME+0xa2f09 , (via qla2x00_do_dpc)
  #5   kernel_thread_helper+0x3

As 'foreach bt' command is very expensive, we cache the results for non-live kernels, so that if you want to do '--find' with several different patterns, the execution will be slow only the first time.

Outstanding Block Device Requests

These are described by 'struct request' which is quite different on different kernels. In addition, the lists of these structures are constructed in very different ways. As a result, instead of trying to traverse these lists, we use a different approach:

  • obtain a list of allocated 'struct request' structures from slabcache using 'kmem -S blkdev_requests'
  • process this list discarding those structures that are obviously bogus

This approach seems to work very nice on all kernels from 2.4.21 to 2.6.24. The output is different for different kernels, for example, on a live 2.6.24 kernel:

crash64> crashinfo --blk
 ---- Outstanding blk_dev Requests -----
    <struct request 0xffff8100787b5000>,
        disk_name=sdb major=8, in_flight=17, cmd_flags=0x10445, ref_count=1,
        started      0.5 s ago
    <struct request 0xffff8100787b5120>,
        disk_name=sdb major=8, in_flight=17, cmd_flags=0x10445, ref_count=1,
        started      0.5 s ago
    <struct request 0xffff810057912000>,
        disk_name=sdb major=8, in_flight=17, cmd_flags=0x10445, ref_count=1,
        started      0.1 s ago

For 2.4.21:

 ---- Outstanding blk_dev Requests -----
    <struct request 0xc8168f00>, rq_status=0x1, rq_dev=0x6802 major=104 minor=2,
        started  28386.3 s ago

Locks, Waitqueues and Semaphores

We can print all globally-declared structures that are in use using the following approach:

  • obtain a list of all symbols that have 'data' type
  • obtain information for each of them and check whether it has a needed type
  • print those structures that are being used at this moment

For example:

crash64> crashinfo --lws
 -- semaphores with sleepers > 0 --
 -- rw_semaphores with count > 0 --
 -- Non-empty wait_queue_head --
     log_wait
        PID: 6013   TASK: ffff81007d2ee7c0  CPU: 0   COMMAND: "dd"
     acpi_bus_event_queue
        PID: 5832   TASK: ffff81007d2367c0  CPU: 1   COMMAND: "acpid"
     vt_activate_queue
        PID: 6325   TASK: ffff81007b456f80  CPU: 1   COMMAND: "console-kit-dae"
        PID: 6152   TASK: ffff8100707e1740  CPU: 1   COMMAND: "console-kit-dae"
        PID: 6151   TASK: ffff8100707e0f80  CPU: 1   COMMAND: "console-kit-dae"
        PID: 6150   TASK: ffff8100707e07c0  CPU: 1   COMMAND: "console-kit-dae"
     serio_wait
        PID: 163    TASK: ffff81007be59740  CPU: 1   COMMAND: "kseriod"

Unfortunately, this approach does not produce all structures of interest as not all of them are declared as global variables (e.g. NFS maintains its own queues/semaphores in server and client structures). More subsystem-specific structures will be added in next releases

File Locks

--filelock' option produces a list of file locks being hold, e.g.

crash64> crashinfo --filelock
Active locks
<struct file_lock 0xffff810037896678> FL_POSIX
        <struct file 0xffff8100529c9a80> <struct dentry 0xffff810054f2d1a0>
        pid=7246 inode=0xffff810051e83998
------------------------------
<struct file_lock 0xffff810037896958> FL_POSIX F_RDLCK
        <struct file 0xffff810057b82840> <struct dentry 0xffff810054f275b0>
        pid=7246 inode=0xffff810051cc48e8
------------------------------

Blocked list
...

Decoding System Calls

'--decodesyscalls' lets you decode the arguments passed to system calls. As arguments originate from userspace, it is not always possible to decode them (as pages might be not present in vmcore). You can specify either PID or system call name. For example:

crash64> crashinfo --decode 1

PID=1  CMD=init
  #0   schedule+0x245
  #1   schedule_timeout+0x95
  #2   do_select+0x468
  #3   core_sys_select+0x209
  #4   __slab_free+0x1fe
  #5   release_task+0x284
  #6   do_wait+0xa25
  #7   sys_select+0x44
  #8   system_call+0x7e, 477 bytes of data
    ....... Decoding Syscall Args .......
    sys_select (7,
        (fd_set *) 0x7fff249af960,
        (fd_set *) 0x7fff249af8e0,
        (fd_set *) 0x7fff249af860,
        (struct timeval *) 0x0)
   nfds=7
   readfds [3, 5, 6]
   writefds []
   exceptfds []
   No timeout

crash32> crashinfo --decode nanosleep
PID=1839  CMD=klogd
  #0   schedule+0x340
  #1   schedule_timeout+0x60
  #2   sys_nanosleep+0xce
  #3   system_call+0x68, 241 bytes of data
    ....... Decoding Syscall Args .......
    sys_nanosleep ((struct timespec *) 0xbfff8e60,
        (struct timespec *) 0xbfff8e60)
         req  1sec, 0nsec
         rem  1sec, 0nsec

PID=6405  CMD=java
  #0   schedule+0x340
  #1   schedule_timeout+0x60
  #2   sys_nanosleep+0xce
  #3   system_call+0x68, 241 bytes of data
    ....... Decoding Syscall Args .......
    sys_nanosleep ((struct timespec *) 0x4cb76b60,
        (struct timespec *) 0x0)
         req  0sec, 49983000nsec
         rem  NULL

Most of sysctl settings are mapped to variables, we can print them using

crash32> crashinfo --sysctl
abi.defhandler_coff                           117440515
abi.defhandler_elf                            0
...
vm.pagecache                                  [2, 10, 20]
vm.pagetable_cache                            [25, 50]
vm.skip_mapped_pages                          0
vm.stack_defer_threshold                      2048

Please note that some sysctl settings are really programmatic, so that the get/set value is computed rather than stored in a variable. We do not print such values.


Related

Wiki: Home