makedumpfile 1.6.0 segfault under Xen with -X
make a small dumpfile of kdump
Brought to you by:
k-hagio
I see the following segfault reproducably under XEN when the -X option is used with makedumpfile 1.6.0:
makedumpfile -E -X --message-level 31 -D /proc/vmcore /kdump/tmp/xx
sadump: does not have partition [ 2035.968876] makedumpfile[587]: segfault at 0 ip 000000000042dc71 sp 00007fffd39ae3f0 error 4 in makedumpfile[400000+57000]
header
sadump: [ 2035.995883] Core dump to |/usr/lib/systemd/systemd-coredump 587 0 0 11 1470765533 makedumpfile pipe failed
read dump device as unknown format
sadump: unknown format
LOAD (0)
phys_start : 0
phys_end : 98800
virt_start : ffff880000000000
virt_end : ffff880000098800
LOAD (1)
phys_start : 100000
phys_end : 10000000
virt_start : ffff880000100000
virt_end : ffff880010000000
LOAD (2)
phys_start : 20000000
phys_end : bf760000
virt_start : ffff880020000000
virt_end : ffff8800bf760000
LOAD (3)
phys_start : 100000000
phys_end : 140000000
virt_start : ffff880100000000
virt_end : ffff880140000000
Xen kdump
page_size : 4096
SYMBOL(dom_xen): ffff82d0802f4e48
SYMBOL(dom_io): ffff82d0802f4e40
SYMBOL(domain_list): ffff82d0802dd100
SYMBOL(xen_heap_start): 0
SYMBOL(frame_table): ffff82d080235cb0
SYMBOL(alloc_bitmap): 0
SYMBOL(max_page): ffff82d0802f4e30
SYMBOL(pgd_l2): 0
SYMBOL(pgd_l3): 0
SYMBOL(pgd_l4): ffff82d0802cb000
SYMBOL(xenheap_phys_end): 0
SYMBOL(xen_pstart): 0
SYMBOL(frametable_pg_dir): 0
SIZE(page_info): 32
OFFSET(page_info.count_info): 8
OFFSET(page_info._domain): 24
SIZE(domain): 2816
OFFSET(domain.domain_id): 0
OFFSET(domain.next_in_list): 112
xen_major_version: 4
xen_minor_version: 7
xen_phys_start: bf200000
frame_table_vaddr: ffff82e000000000
xen_heap_start: 0
xen_heap_end:0
alloc_bitmap: 0
max_page: 0
num_domain: 3
0: 13d8c6: ffff83013d8c6000
32754: 13d8fa: ffff83013d8fa000
32753: 13d8f9: ffff83013d8f9000
max_mapnr : 140000
max_mapnr : 140000
There is enough free memory to be done in one cycle.
Buffer size for the cyclic mode: 327680
mem_map (0)
mem_map : 0
pfn_start : 0
pfn_end : 0
mmap() is available on the kernel.
Checking for memory holes : [100.0 %] |STEP [Checking for memory holes ] : 0.000029 seconds
Segmentation fault
Without -X, the same command succeeds. Makedumpfile also succeeds with other compression or dump level settings.
The segfault happens at this point in the code:
4108 int
4109 set_bitmap_buffer(struct dump_bitmap *bitmap, mdf_pfn_t pfn, int val, struct cycle *cycle)
4110 {
4111 int byte, bit;
4112 static int warning = 0;
4113
4114 if (pfn < cycle->start_pfn || cycle->end_pfn <= pfn) {
^^--- here
Dump of assembler code for function set_bitmap_buffer:
0x000000000042dc70 <+0>: push %rbx
0x000000000042dc71 <+1>: mov (%rcx),%rax <=== here
0x000000000042dc74 <+4>: mov %rcx,%rbx
0x000000000042dc77 <+7>: cmp %rsi,%rax
Analysis:
exclude_xen4_user_domain()calls clear_bit_on_2nd_bitmap(pfn, NULL) to exclude domU ranges. This resolves to
set_bitmap(info->bitmap2, pfn, 0, NULL)
-> set_bitmap_buffer(info->bitmap2, pfn, 0, NULL) (because bitmap2->fd == 0)
==> segfault, set_bitmap_buffer can't handle NULL as cycle pointer.
If non-cyclic approach is used (always under XEN AFAICS), makedumpfile needs a bitmap fd to avoid this crash. But info->flag_cyclic can change after open_dump_bitmap() is called.
Proposed Solution
See attached patches.
I tested the patched version of makedumpfile succeessfully both with bare-metal Linux and Xen.
Adding one more patch that sanitizes the behavior of close_dump_bitmap(). Not strictly necessary to solve the problem but the logic of
{open,close}_dump_bitmapshould match.Fixed in v1.6.1