makedumpfile 1.6.0 segfault under Xen with -X
make a small dumpfile of kdump
Brought to you by:
k-hagio
I see the following segfault reproducably under XEN when the -X option is used with makedumpfile 1.6.0:
makedumpfile -E -X --message-level 31 -D /proc/vmcore /kdump/tmp/xx sadump: does not have partition [ 2035.968876] makedumpfile[587]: segfault at 0 ip 000000000042dc71 sp 00007fffd39ae3f0 error 4 in makedumpfile[400000+57000] header sadump: [ 2035.995883] Core dump to |/usr/lib/systemd/systemd-coredump 587 0 0 11 1470765533 makedumpfile pipe failed read dump device as unknown format sadump: unknown format LOAD (0) phys_start : 0 phys_end : 98800 virt_start : ffff880000000000 virt_end : ffff880000098800 LOAD (1) phys_start : 100000 phys_end : 10000000 virt_start : ffff880000100000 virt_end : ffff880010000000 LOAD (2) phys_start : 20000000 phys_end : bf760000 virt_start : ffff880020000000 virt_end : ffff8800bf760000 LOAD (3) phys_start : 100000000 phys_end : 140000000 virt_start : ffff880100000000 virt_end : ffff880140000000 Xen kdump page_size : 4096 SYMBOL(dom_xen): ffff82d0802f4e48 SYMBOL(dom_io): ffff82d0802f4e40 SYMBOL(domain_list): ffff82d0802dd100 SYMBOL(xen_heap_start): 0 SYMBOL(frame_table): ffff82d080235cb0 SYMBOL(alloc_bitmap): 0 SYMBOL(max_page): ffff82d0802f4e30 SYMBOL(pgd_l2): 0 SYMBOL(pgd_l3): 0 SYMBOL(pgd_l4): ffff82d0802cb000 SYMBOL(xenheap_phys_end): 0 SYMBOL(xen_pstart): 0 SYMBOL(frametable_pg_dir): 0 SIZE(page_info): 32 OFFSET(page_info.count_info): 8 OFFSET(page_info._domain): 24 SIZE(domain): 2816 OFFSET(domain.domain_id): 0 OFFSET(domain.next_in_list): 112 xen_major_version: 4 xen_minor_version: 7 xen_phys_start: bf200000 frame_table_vaddr: ffff82e000000000 xen_heap_start: 0 xen_heap_end:0 alloc_bitmap: 0 max_page: 0 num_domain: 3 0: 13d8c6: ffff83013d8c6000 32754: 13d8fa: ffff83013d8fa000 32753: 13d8f9: ffff83013d8f9000 max_mapnr : 140000 max_mapnr : 140000 There is enough free memory to be done in one cycle. Buffer size for the cyclic mode: 327680 mem_map (0) mem_map : 0 pfn_start : 0 pfn_end : 0 mmap() is available on the kernel. Checking for memory holes : [100.0 %] |STEP [Checking for memory holes ] : 0.000029 seconds Segmentation fault Without -X, the same command succeeds. Makedumpfile also succeeds with other compression or dump level settings. The segfault happens at this point in the code: 4108 int 4109 set_bitmap_buffer(struct dump_bitmap *bitmap, mdf_pfn_t pfn, int val, struct cycle *cycle) 4110 { 4111 int byte, bit; 4112 static int warning = 0; 4113 4114 if (pfn < cycle->start_pfn || cycle->end_pfn <= pfn) { ^^--- here Dump of assembler code for function set_bitmap_buffer: 0x000000000042dc70 <+0>: push %rbx 0x000000000042dc71 <+1>: mov (%rcx),%rax <=== here 0x000000000042dc74 <+4>: mov %rcx,%rbx 0x000000000042dc77 <+7>: cmp %rsi,%rax
Analysis:
exclude_xen4_user_domain()
calls clear_bit_on_2nd_bitmap(pfn, NULL)
to exclude domU ranges. This resolves to
set_bitmap(info->bitmap2, pfn, 0, NULL) -> set_bitmap_buffer(info->bitmap2, pfn, 0, NULL) (because bitmap2->fd == 0) ==> segfault, set_bitmap_buffer can't handle NULL as cycle pointer.
If non-cyclic approach is used (always under XEN AFAICS), makedumpfile needs a bitmap fd to avoid this crash. But info->flag_cyclic can change after open_dump_bitmap() is called.
Proposed Solution
See attached patches.
I tested the patched version of makedumpfile succeessfully both with bare-metal Linux and Xen.
Adding one more patch that sanitizes the behavior of close_dump_bitmap(). Not strictly necessary to solve the problem but the logic of
{open,close}_dump_bitmap
should match.Fixed in v1.6.1