Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#18 segfault restoring level 1 backup

closed-fixed
Stelian Pop
None
5
2005-03-31
2005-03-17
Jon Willeke
No

I have some backups from dump 0.4b33 of an ext3 file system on
IA32 Red Hat Linux 9 that I'm trying to restore with 0.4b37 to an
XFS file system on AMD64 SuSE Linux 9.2. I was able to restore
the level 0 backup, but restore crashes on the level 1 with a
segmentation fault:

Verify tape and initialize maps
Input is from a local file/pipe
Input block size is 32
Dump date: Sun Feb 27 20:00:01 2005
Dumped from: Mon Jul 14 12:02:40 2003
Level 1 dump of an unlisted file system on xxx:/dev/rootvg/home-
snap
Label: none
Begin incremental restore
Initialize symbol table.
Segmentation fault

I'm guessing the next step is to build a debug version of restore and
rerun this in the debugger. In the mean time, I want to make sure
that what I'm trying to do should work: mixed architecture, mixed file
systems, mixed dump versions, etc.

Discussion

  • Stelian Pop
    Stelian Pop
    2005-03-18

    • assigned_to: nobody --> stelian
     
  • Stelian Pop
    Stelian Pop
    2005-03-18

    Logged In: YES
    user_id=5513

    Mixing architectures, file systems, dump versions should be ok.

    The first step is to try the latest version of restore (0.4b39).

    If it doesn't work, build it using './configure
    --enable-debug' and run it under gdb.

    Stelian.

     
  • Jon Willeke
    Jon Willeke
    2005-03-18

    Logged In: YES
    user_id=185468

    Here's the GDB backtrace from 0.4b39:

    #0 0x0000002a95eb4256 in _int_free () from /lib64/tls/libc.so.6
    #1 0x0000002a95eb4737 in free () from /lib64/tls/libc.so.6
    #2 0x000000000040f1ff in resizemaps (oldmax=581633, newmax=516097)
    at utilities.c:515
    #3 0x000000000040a402 in initsymtable (filename=0x412fd8 "./
    restoresymtable")
    at symtab.c:715
    #4 0x0000000000407747 in main (argc=Variable "argc" is not available.
    ) at main.c:468

     
  • Stelian Pop
    Stelian Pop
    2005-03-18

    Logged In: YES
    user_id=5513

    I cannot see why it would segfault on that line. It's a free
    for something which isn't null since it is tested a few
    lines before...

    Stelian.

     
  • Jon Willeke
    Jon Willeke
    2005-03-21

    Logged In: YES
    user_id=185468

    I'm not familiar with the code, but something doesn't seem right about
    resizemaps():

    map = calloc((unsigned)1, (unsigned)howmany(newmax, NBBY));
    if (map == NULL)
    errx(1, "no memory for file dump list");
    memcpy(map, dumpmap, howmany(oldmax, NBBY));
    free(dumpmap);

    If oldmax is greater than newmax, as shown in the backtrace, does that
    mean that we're trying to put ten pounds into a five-pound sack?

     
  • Stelian Pop
    Stelian Pop
    2005-03-21

    Logged In: YES
    user_id=5513

    You are of course correct, the bug is obvious, I wonder how
    I didn't see it before...

    What if you add an explicit test at the beginning of that
    function to just return immediately if newmax if less than
    oldmax ? Does this fix all the problems ?

    Stelian.

     
  • Jon Willeke
    Jon Willeke
    2005-03-21

    Logged In: YES
    user_id=185468

    I added the following to the beginning of resizemaps():

    if (oldmax > newmax)
    return;

    The first time I try to restore the level 1 backup, restore aborts with the
    following message:

    /sbin/restore: cannot create directory temporary /tmp//rstdir1111366800:
    File exists

    When I check /tmp, there is no such directory, so I think restore deleted it.
    The second time I run restore, it aborts with the following message:

    /sbin/restore: cannot create modefile /tmp//rstmode1111366800: File
    exists

    Again, restore seems to delete it. The third time I run restore, it crashes
    with a segmentation fault. Here is the backtrace:

    #0 myname (ep=0x0) at symtab.c:272
    #1 0x000000000040f570 in badentry (ep=0x2a969e7e40, msg=Variable
    "msg" is not available.
    ) at utilities.c:383
    #2 0x000000000040f6ed in removeleaf (ep=0x2a969e7e40) at utilities.c:
    218
    #3 0x0000000000408a35 in findunreflinks () at restore.c:590
    #4 0x000000000040777f in main (argc=Variable "argc" is not available.
    ) at main.c:473

    Here's the output leading to the segfault:

    Find unreferenced names.
    Remove leaf ./willeke/pkgs/sudo.spec
    Remove leaf ./willeke/pkgs/kernel-2.4.20-18.9jw1.src.rpm
    bad entry: removeleaf: not a leaf
    name: ./willeke/pkgs/cache
    parent name ./willeke/pkgs

     
  • Stelian Pop
    Stelian Pop
    2005-03-22

    Logged In: YES
    user_id=5513

    The two 'cannot create' aborts are known bugs, they are
    caused by the fact that the previous restore stopped
    abnormally and leaved those files in place. This shouldn't
    happen in normal situations, only when restore previously
    segfaulted.

    As for your problem, I think the new segfault is caused by
    the same 'maxino' related code . Try to replace, in
    symtab.c, the call to resizemaps(maxino, hdr.maxino) with:
    if (hdr.maxino > maxino) {
    resizemaps(maxino, hdr.maxino);
    maxino = hdr.maxino;
    }

    Stelian.

     
  • Jon Willeke
    Jon Willeke
    2005-03-22

    Logged In: YES
    user_id=185468

    Actually, the "cannot create" aborts occur immediately after a clean level 0
    restore. At least, I think it was clean; there was no segfault.

    The new segfault appears to be a bug in utilities.c, badentry():

    for (i = 0; i < DIRHASH_SIZE; i++) {
    if (ep->e_entries[i] != NULL) {
    fprintf(stderr, "next entry name: %s\n", myname(ep->e_entries[0]));
    break;
    }
    }

    The loop finds a non-NULL element in ep->e_entries, then passes ep-
    >e_entries[0], which it has just determined to be NULL, to
    myname().

    Also, createleaves(), in restore.c, can call myname() with NULL if the user
    doesn't abort in the panic() call. It may be worth guarding against a NULL
    argument in myname().

    After making the above changes (attached), I get to the panic() in
    badentry():

    Find unreferenced names.
    Remove leaf ./willeke/pkgs/sudo.spec
    Remove leaf ./willeke/pkgs/kernel-2.4.20-18.9jw1.src.rpm
    bad entry: removeleaf: not a leaf
    name: ./willeke/pkgs/cache
    parent name ./willeke/pkgs
    next entry name: ./willeke/pkgs/cache/ODBCuserguide.txt
    entry type: NODE
    inode number: 144020
    flags: NIL
    abort? [yn]

    If I choose not to abort through several bad entries, I can complete the
    restore. However, I'm concerned about its correctness. I don't understand
    the meaning of all these bad entries.

     
  • Jon Willeke
    Jon Willeke
    2005-03-22

    avoid some segfaults in dump 0.4b39

     
  • Stelian Pop
    Stelian Pop
    2005-03-23

    Logged In: YES
    user_id=5513

    For the "cannot create" problem: it may occur after a clean
    level 0 restore, but it is the previous *level 1* restore
    which segfaulted and leaved those files in place.

    The two other bugs you found are bugs indeed and need to be
    fixed. However, you shouldn't have hit those sections of
    code in the first place.

    What is strange is that 'maxino' is different in the level 0
    dump and in the level 1 dump, and the only reason I see
    which could have caused this is because you resized the
    filesystem between the level 0 dump and the level 1 dump.
    And it must be this resize which makes restore fail somewhere.

    The 'maxino' var is used for a number of structures inside
    restore, and it is probably one of those structures (or an
    algorithm using them) which doesn't take into account the
    change. I am still searching what can be wrong...

    I did try to reproduce the bug on my side, and I was able to
    reproduce the original segfault but once I add the
    'hdr.maxino > maxino' test it completes ok.

    Stelian.

     
  • Jon Willeke
    Jon Willeke
    2005-03-23

    Logged In: YES
    user_id=185468

    Yes, I forgot that I had resized the file system. It is ext3 on a logical
    volume. It was resized once with e2fsadm, which does a umount,
    lvextend, resize2fs, and mount).

    I made a new level 0 and level 1 dump, but the level 1 restore keeps
    panicking with bad TMPNAME entries, like RSTTMP0544730. When I
    piped yes n to restore, it ran continuously, at one point corrupting the
    terminal font.

    If you're satisfied that the segfault is resolved, I can enter this as a new
    support request.

     
  • Stelian Pop
    Stelian Pop
    2005-03-23

    Logged In: YES
    user_id=5513

    The TMPNAME issue should be corrected by this patch :
    http://cvs.sourceforge.net/viewcvs.py/dump/dump/restore/restore.c?r1=1.35&r2=1.36

    I hope the above will solve the new problem.

    However, I am not satisfied with the old problem, but I
    don't know how to debug this. Is there any chance you could
    give me access to the old level-0 and level-1 dumps (in case
    they are not enormous and they do not contain sensitive data) ?

     
  • Jon Willeke
    Jon Willeke
    2005-03-23

    Logged In: YES
    user_id=185468

    Patching restore.c to 1.36 fixed the RSTTMP problem, so I feel much
    better, now.

    The old level 0 dump is 1.7GB (500MB compressed), and the old level 1
    dump is 1.3GB (600MB compressed). The data isn't overly sensitive.

     
  • Stelian Pop
    Stelian Pop
    2005-03-31

    • status: open --> closed-fixed
     
  • Stelian Pop
    Stelian Pop
    2005-03-31

    Logged In: YES
    user_id=5513

    While examining your dumps I found several inconsistencies
    in the dump themselves, so it's normal that restore has
    trouble restoring the data.

    I am unable to say if this has been caused by a bug in (some
    old version of) dump, or if this is somehow a consequence of
    resizing the filesystem.

    I will therefore close this support request, but I'll
    encourage you to do restore tests on your backups once in a
    while and report back if the problem occurs again.

    Stelian.