Menu

#16 Support for the musl C library

None
closed-fixed
None
5
2025-06-27
2025-02-05
No

dump currently does not build on systems (alpine, gentoo, void linux, etc) where the libc is musl and not glibc. Some of the issues are easy to fix, for example by replacing the __BEGIN_DECLS and __END_DECLS macros from sys/cdefs.h per https://wiki.musl-libc.org/faq.html#Q:-When-compiling-something-against-musl,-I-get-error-messages-about-%3Ccode%3Esys/cdefs.h%3C/code%3E

But others are less easy. fstab.h is missing, is it important? The daddr_t type is also nowhere to be found and I don't know what to do about that.

I'm not even sure if this is within the scope of the dump tool, but we have a bug report about it on Gentoo, so now you have one too :)

Discussion

1 2 > >> (Page 1 of 2)
  • Tim Woodall

    Tim Woodall - 2025-03-02
    • assigned_to: Tim Woodall
    • Group: -->
     
  • Tim Woodall

    Tim Woodall - 2025-03-02

    Thanks for reporting this. I'm not sure exactly how far I'll be able to get with this but I see that debian does package musl so at some point I'll try and get this compiling with that and see how far that gets us. I don't have a huge amount of time to work on dump so I almost certainly won't be setting up a new distribution to test with in the forseeable future but perhaps just getting it building on debian will be enough.

     
  • Tim Woodall

    Tim Woodall - 2025-03-09
     
  • Tim Woodall

    Tim Woodall - 2025-03-09

    I had a brief look into this and I'm not sure if I'm going to be able to do much for this in the foreseeable future. While installing musl-tools got me as far as running configure, it looks like at least on debian, I'm stuck on things as fundamental as ext2fs/ext2_fs.h.

    But the fstab.h include is irrelevant. I will remove them in the next release. As are the cdefs.h includes and the DECLS stuff which can be deleted.

    daddr_t is defined in sys/types.h and ultimately resolves to:
    __STD_TYPE __DADDR_T_TYPE __daddr_t; /* The type of a disk address. */
    But as it's only used in sizeof and the underlying code uses __u32 in new_bsd_inode I'm not completely sure why this type is used at all now. It was used in many more places back in 2001. I will change these as follows in the next release (unless I find a reason why they cannot be changed)

    diff --git a/dump/traverse.c b/dump/traverse.c
    index e8da13c9..f63c1f3c 100644
    --- a/dump/traverse.c
    +++ b/dump/traverse.c
    @@ -780,7 +780,7 @@ dumpino(struct ext2_inode_large *dp, dump_ino_t ino, int metaonly)
            nbi.di_atime.tv_sec = dp->i_atime;
            nbi.di_mtime.tv_sec = dp->i_mtime;
            nbi.di_ctime.tv_sec = dp->i_ctime;
    
    -       memmove(&nbi.di_db, &dp->i_block, (NDADDR + NIADDR) * sizeof(daddr_t));
    +       memmove(&nbi.di_db, &dp->i_block, (NDADDR + NIADDR) * sizeof(nbi.di_db[0]));
            nbi.di_flags = dp->i_flags;
            nbi.di_blocks = dp->i_blocks;
            nbi.di_gen = dp->i_generation;
    @@ -818,7 +818,7 @@ dumpino(struct ext2_inode_large *dp, dump_ino_t ino, int metaonly)
                     * Check for short symbolic link.
                     */
                    if (i_size > 0 &&
    -                   i_size < EXT2_N_BLOCKS * sizeof (daddr_t)) {
    +                   i_size < EXT2_N_BLOCKS * sizeof (dp->i_block[0])) {
                            spcl.c_addr[0] = 1;
                            spcl.c_count = 1;
                            writeheader(ino);
    @@ -1026,7 +1026,7 @@ dumpdirino(struct ext2_inode_large *dp, dump_ino_t ino)
            nbi.di_atime.tv_sec = dp->i_atime;
            nbi.di_mtime.tv_sec = dp->i_mtime;
            nbi.di_ctime.tv_sec = dp->i_ctime;
    -       memmove(&nbi.di_db, dp->i_block, (NDADDR + NIADDR) * sizeof(daddr_t));
    +       memmove(&nbi.di_db, dp->i_block, (NDADDR + NIADDR) * sizeof(nbi.di_db[0]));
            nbi.di_flags = dp->i_flags;
            nbi.di_blocks = dp->i_blocks;
            nbi.di_gen = dp->i_generation;
    
     
  • Michael Orlitzky

    No problem,e very little bit helps, thanks. What's the issue with the ext2fs headers? Don't they come from either e2fsprogs or the kernel headers?

     
  • Michael Orlitzky

    With a little more hacking I was able to get make to complete. Pending more serious testing,

    • In dump/unctime.c, the timelocal function can be replaced by mktime. According the man-pages project, the latter is a POSIX replacement for the former.
    • restore/dirs.c and restore/tape.c need #include <fcntl.h> for open() and friends.
    • restore/restore.h needs #include <limits.h> for PATH_MAX.

    I also had to comment out the rcmd call in common/dumprmt.c, which probably is not the solution you want, but maybe the remote dump feature could be compiled out or something on systems with no rcmd.

     
  • Tim Woodall

    Tim Woodall - 2025-03-12

    That is good news. I'll have a look at those changes and see what I can do. I thought there was a way to compile out the rcmd stuff but obviously not. I'm seriously tempted to add that so that it can be only rsh based. I'm not sure if anybody still uses rcmd for anything.

    FWIW there's a big stack of tests in testing/scripts - but only from git, not from the release tarball. Unfortunately, they require root to run and do many "aggressive" things like mke2fs, rm -fr so please be cautious. They work for me but I cannot guarantee that there's not some quirk in there that will be catastrophic for someone else. The entire suite can be run from ./run-all-tests.sh -h

    (One of my goals for the v0.4b50 release was to allow them to run as non-root using unshare and fuse but I've hit a couple of "hard" issues to solve that are going to take a bit of perseverance. I've collected a fair few minor fixes and cleanups locally so I'll probably create 0.4b50 soon without non-root tests)

     
  • Tim Woodall

    Tim Woodall - 2025-03-12
    diff --git a/common/dumprmt.c b/common/dumprmt.c
    index a48f88dd..d9437b81 100644
    --- a/common/dumprmt.c
    +++ b/common/dumprmt.c
    @@ -152,10 +152,12 @@ rmtgetconn(void)
            static struct passwd *pwd = NULL;
            const char *tuser;
            const char *rsh;
    +#if DUMP_USES_RCMD
            int size;
            int throughput;
            int on;
            char *rmtpeercopy;
    +#endif
    
            rsh = getenv("RSH");
    
    @@ -220,6 +222,7 @@ rmtgetconn(void)
                    }
            }
            else {
    +#if DUMP_USES_RCMD
                    /* Copy rmtpeer to rmtpeercopy to ignore the
                       return value from rcmd. I cannot figure if
                       this is this a bug in rcmd or in my code... */
    @@ -253,6 +256,10 @@ rmtgetconn(void)
                    if (setsockopt(tormtape, IPPROTO_TCP, TCP_NODELAY, &on, sizeof (on)) < 0)
                            perror("TCP_NODELAY setsockopt");
                    fromrmtape = tormtape;
    +#else
    
    +               msg("This version of dump does not support rcmd. Set RSH to access a remote tape.");
    +               return 0;
    +#endif
            }
            (void)fprintf(stderr, "Connection to %s established.\n", rmtpeer);
            return 1;
    diff --git a/configure.ac b/configure.ac
    index b07a0d85..3dec7349 100644
    --- a/configure.ac
    +++ b/configure.ac
    @@ -188,6 +188,22 @@ fi
     echo "Not including Mac OSX restore compatibility code by default"
     )
    
    +dnl
    +dnl Handle --enable-rcmd
    +dnl
    +AC_ARG_ENABLE([rcmd], [AS_HELP_STRING([--enable-rcmd],[include rcmd support @<:@default=yes@:>@])],
    +if test "$enableval" = "no"
    +then
    
    +       echo "Not including support for rcmd remote tapes"
    +else
    +       AC_DEFINE([DUMP_USES_RCMD],1,[Define this if you want to support rcmd as well as rsh for accessing remote tapes.])
    +       echo "Including support for rcmd remote tapes"
    +fi
    +,
    +AC_DEFINE([DUMP_USES_RCMD],1,[Define this if you want to support rcmd as well as rsh for accessing remote tapes.])
    +echo "Including support for rcmd remote tapes by default"
    +)
    +
     dnl
     dnl Handle --enable-selinux
     dnl
    

    ./configure --enable-rcmd=no appears to DTRT.

    It should be #include <linux/limits.h> for PATH_MAX. Does that work for you?

     
  • Tim Woodall

    Tim Woodall - 2025-03-19

    This changes have been made in 0.4b50

     
  • Michael Orlitzky

    Thanks a lot, it builds now with --disable-rcmd. I've been meaning to run the test suite, but I'm not brave enough to do it on my real workstation. I should be able to use an Alpine linux (they use musl) VM but my home machine is a bit weird and I haven't been able to get kvm/qemu working. The next time I'm at my office I have a nice normal amd64 machine I can use though.

     
  • Tim Woodall

    Tim Woodall - 2025-03-20

    A quick test you can do asuming you have an ext[234] partition mounted somewhere (ro to be extra safe) is something like this which dumps a fs to stdout and then pipes it into restore to verify against the mounted fs
    (you still need to be root but it doesn't involve all the complex logic to create filesystems and clean up afterwards.)

    `
    umount /mnt/nobackup/backup/
    mount -o ro /mnt/nobackup/backup/

    dump -0 -f - /dev/mapper/vg--dirac-backup | restore -C -D /mnt/nobackup/backup/ -f -
    DUMP: Date of this level 0 dump: Thu Mar 20 18:39:56 2025
    DUMP: Dumping /dev/mapper/vg--dirac-backup (/mnt/nobackup/backup) to standard output
    DUMP: Label: none
    DUMP: Writing 10 Kilobyte records
    DUMP: mapping (Pass I) [regular files]
    DUMP: mapping (Pass II) [directories]
    DUMP: estimated 1372620 blocks.
    DUMP: Volume 1 started with block 1 at: Thu Mar 20 18:39:57 2025
    Dump date: Thu Mar 20 18:39:56 2025
    Dumped from: the epoch
    Level 0 dump of /mnt/nobackup/backup on dirac.home.woodall.me.uk:/dev/mapper/vg--dirac-backup
    Label: none
    DUMP: dumping (Pass III) [directories]
    filesys = /mnt/nobackup/backup/
    DUMP: dumping (Pass IV) [regular files]
    DUMP: Volume 1 completed at: Thu Mar 20 18:40:04 2025
    DUMP: Volume 1 1369690 blocks (1337.59MB)
    DUMP: Volume 1 took 0:00:07
    DUMP: Volume 1 transfer rate: 195670 kB/s
    DUMP: 1369690 blocks (1337.59MB)
    DUMP: finished in 7 seconds, throughput 195670 kBytes/sec
    DUMP: Date of this level 0 dump: Thu Mar 20 18:39:56 2025
    DUMP: Date this dump completed: Thu Mar 20 18:40:04 2025
    DUMP: Average transfer rate: 195670 kB/s
    DUMP: DUMP IS DONE

     
  • Tim Woodall

    Tim Woodall - 2025-03-20

    And add -v to the restore command if you want lots of logging from restore (otherwise it only logs on a problems)

     
  • Michael Orlitzky

    dump seems to work OK...

    $ sudo ./dump/dump -0 -f debian.img /dev/mmcblk1p2
      DUMP: WARNING: no file `/usr/local/etc/dumpdates'
      DUMP: Date of this level 0 dump: Fri Mar 21 12:47:12 2025
      DUMP: Dumping /dev/mmcblk1p2 (/mnt/debian) to debian.img
      DUMP: Label: none
      DUMP: Writing 10 Kilobyte records
      DUMP: mapping (Pass I) [regular files]
      DUMP: mapping (Pass II) [directories]
      DUMP: estimated 4963130 blocks.
      DUMP: Volume 1 started with block 1 at: Fri Mar 21 12:47:15 2025
      DUMP: dumping (Pass III) [directories]
      DUMP: dumping (Pass IV) [regular files]
      DUMP: Closing debian.img
      DUMP: Volume 1 completed at: Fri Mar 21 12:50:05 2025
      DUMP: Volume 1 4946060 blocks (4830.14MB)
      DUMP: Volume 1 took 0:02:50
      DUMP: Volume 1 transfer rate: 29094 kB/s
      DUMP: 4946060 blocks (4830.14MB) on 1 volume(s)
      DUMP: finished in 170 seconds, throughput 29094 kBytes/sec
      DUMP: Date of this level 0 dump: Fri Mar 21 12:47:12 2025
      DUMP: Date this dump completed:  Fri Mar 21 12:50:05 2025
      DUMP: Average transfer rate: 29094 kB/s
      DUMP: DUMP IS DONE
    

    but restore hit some problem:

    $ ./restore/restore -v -C -D /mnt/debian -f debian.img 
    Password: 
    Begin compare restore
    Verify tape and initialize maps
    Input is from a local file/pipe
    Tape block size is 10
    Dump   date: Fri Mar 21 12:39:15 2025
    Dumped from: the epoch
    Level 0 dump of /mnt/debian on mertle:/dev/mmcblk1p2
    Label: none
    Incorrect block for <file removal list> at 1845 blocks
    Incorrect block for <file removal list> at 1846 blocks
    Incorrect block for <file removal list> at 1847 blocks
    ...
    Incorrect block for <file removal list> at 3068 blocks
    ./restore/restore: Cannot find file dump list
    

    Keep in mind though that in addition to using musl, this is also a RISC-V system. It has an old forked kernel and brand new compilers. The risk of conflating issues is high.

     
  • Tim Woodall

    Tim Woodall - 2025-03-21

    Oh dear, that's going wrong right at the very start of reading the tape.
    You got to restore/tape.c line 388 so it found the start of the CLRI bitmap, which will have been full of zeros for a L0 dump, but it didn't then find the BITS bitmap following it (which will have had a bit for each inode that is going to be dumped set.)

    Don't know if you're tried building it but
    make check

    builds a few helper program in faketape.

    (none of this needs to be done as root)

    First run ./faketape_test (it takes 30s or so to run on my system but if it doesn't work there's no point continuing with the rest of this until that is sorted. It's a standalone program that doesn't depends on any of the code in dump/restore)

    ./faketape_test 
    ===============================================================================
    All tests passed (79946 assertions in 6 test cases)
    

    Then try converting your file image into a faketape image.
    First parameter is a NEW file that it will create, it will refuse to run if that file exists. Second parameter is your image. This will take a while even if it works. If it fails then that cannot read your tape either but it's going to be much easier to see what it's complaining about, the entire program is only around 200 lines.

    ./file-to-tape ../faketape.img ../x.img 
    file-to-tape: Starting ../x.img
    file-to-tape: Dump ../x.img written successfully
    file-to-tape: Finished
    

    Finally, if that works (and I suspect that it won't)

    ./faketape-st -f ../faketape.img rewind
    
    ./dump-info ../faketape.img |& less
    

    (note that you have to rewind again to run dump-info again. faketape emulates a non-auto rewinding tape)

    and you should get something like

    dump-info: Starting ./dump-info
    Tape has ntrec of 10
    TS_TAPE  at blockno 1 inode=0 mode=---------- UNKNOWN size=0 flags=
    TS_CLRI  at blockno 2 inode=128016 mode=---------- UNKNOWN size=16384 flags=
      16 data blocks with CLR-inode bitmap
    TS_BITS  at blockno 19 inode=2 mode=---------- UNKNOWN size=16384 flags=
      16 data blocks with HAS-inode bitmap
    TS_INODE at blockno 36 inode=2 mode=-rwxrwxrwt directory size=5632 flags=
      INODE 2 mode=-rwxrwxrwt directory size=5632 flags=
    

    at the start. You see that CLRI size=16384 which is why there are 16 blocks. Then there's the TS_BITS which is the same size and then beyond that the data stuff starts.

     
  • Tim Woodall

    Tim Woodall - 2025-03-21

    Ha!
    file-to-tape: Dump ../x.img written successfully
    that's a bug. It's not writing x.img at all, only reading it.

     
  • Tim Woodall

    Tim Woodall - 2025-03-21
    diff --git a/faketape/file-to-tape.cpp b/faketape/file-to-tape.cpp
    index 03237b91..07c88ac6 100644
    --- a/faketape/file-to-tape.cpp
    +++ b/faketape/file-to-tape.cpp
    @@ -172,7 +172,7 @@ int main(int argc, char* argv[]) {
                            }
                    }
                    close(dumpfd);
    
    -               warnx("Dump %s written successfully", argv[0]);
    +               warnx("Dump %s read successfully", argv[0]);
                    argv++;
            }
            tape.close();
    
     
  • Tim Woodall

    Tim Woodall - 2025-03-21

    Actually, now I remind myself about how this works, file-to-tape should work, it's dump-info that will hopefully give some clues as to what's going wrong.

     
  • Tim Woodall

    Tim Woodall - 2025-03-21

    I'm trying to guess what is most likely to be the cause. Struct padding would be my first thought but u_spcl_size_assert is supposed to do a compile time check on this.
    You might want to change say line 129 of compat/include/protocols/dumprestore.h to something like:
    case sizeof(us) == 1025 ? 1 : 0: and ensure that it fails to build and the musl compiler isn't doing something clever and not compiling an unused function which is then hiding a struct padding issue.

    In file included from dumprestore.c:44:
    ../../compat/include/protocols/dumprestore.h: In function 'u_spcl_size_assert':
    ../../compat/include/protocols/dumprestore.h:129:17: error: duplicate case value
      129 |                 case sizeof(us)  == 1025 ? 1 : 0:
          |                 ^~~~
    ../../compat/include/protocols/dumprestore.h:126:17: note: previously used here
      126 |                 case 0:
          |                 ^~~~
    
     
    • Michael Orlitzky

      This does fail

       
  • Michael Orlitzky

    What version of the catch library are you using? I think the two that are available on Gentoo are too old / too new, but I could build one myself. Ignoring that for the moment...

    To get the rest to build, I had to add #include <sys/stat.h> to faketape/bswap_header.cpp and faketape/dump-info.cpp. Afterwards, file-to-tape does work. This is what dump-info has to say:

    dump-info: Starting ./dump-info
    Tape has ntrec of 10
    TS_TAPE  at blockno 1 inode=0 mode=---------- UNKNOWN size=0 flags=
    TS_CLRI  at blockno 2 inode=15097856 mode=---------- UNKNOWN size=1888256 flag$
      1844 data blocks with CLR-inode bitmap
    TS_BITS  at blockno 1847 inode=2 mode=---------- UNKNOWN size=1888256 flags=
      1844 data blocks with HAS-inode bitmap
    TS_INODE at blockno 3692 inode=2 mode=-rwxr-xr-x directory size=512 flags=
      INODE 2 mode=-rwxr-xr-x directory size=512 flags=
        ino=2 reclen=12 type=directory namlen=1 : .
        ino=2 reclen=12 type=directory namlen=2 : ..
        ino=11 reclen=20 type=directory namlen=10 : lost+found
        ino=12 reclen=12 type=symbolic link namlen=3 : bin
        ino=2097153 reclen=16 type=directory namlen=4 : boot
        ino=6029313 reclen=12 type=directory namlen=3 : dev
        ino=1835009 reclen=12 type=directory namlen=3 : etc
        ino=13893633 reclen=16 type=directory namlen=4 : home
        ino=13 reclen=12 type=symbolic link namlen=3 : lib
        ...
    

    and it goes on for another 33MiB before finishing with

    TS_INODE at blockno 4946049 inode=14155778 mode=-rwxr-xr-x regular file size=4$
      INODE 14155778 mode=-rwxr-xr-x regular file size=412 flags=
          Processed 4 data blocks in 1 indexes crc=a63cb4e2
    TS_END   at blockno 4946054 inode=15097856 mode=-rwxr-xr-x regular file size=4$
      volinfo[1] = 2
    TS_END   at blockno 4946055 inode=15097856 mode=-rwxr-xr-x regular file size=4$
      volinfo[1] = 2
    TS_END   at blockno 4946056 inode=15097856 mode=-rwxr-xr-x regular file size=4$
      volinfo[1] = 2
    TS_END   at blockno 4946057 inode=15097856 mode=-rwxr-xr-x regular file size=4$
      volinfo[1] = 2
    TS_END   at blockno 4946058 inode=15097856 mode=-rwxr-xr-x regular file size=4$
      volinfo[1] = 2
    TS_END   at blockno 4946059 inode=15097856 mode=-rwxr-xr-x regular file size=4$
      volinfo[1] = 2
    TS_END   at blockno 4946060 inode=15097856 mode=-rwxr-xr-x regular file size=4$
      volinfo[1] = 2
    dump-info: Finished
    
     
  • Michael Orlitzky

    Nevermind about catch, I was able to get it working with v3.7.1:

    $ ./faketape_test 
    Randomness seeded to: 462966853
    ===============================================================================
    All tests passed (79946 assertions in 6 test cases)
    
     
  • Tim Woodall

    Tim Woodall - 2025-03-22

    This looks like an off-by-one bug in dump / restore. Hopefully easily fixed in restore otherwise..

    TS_CLRI  at blockno 2 inode=15097856 mode=---------- UNKNOWN size=1888256 flag$
      1844 data blocks with CLR-inode bitmap
      ```
      That's 15MM inodes stored in a bitmap of 1844 blocks but.
    
     ``` 
     scale=10
    15097856/8/1024
    1843.0000000000
    

    It fits exactly in 1843 blocks.

    Label: none
    Incorrect block for <file removal list> at 1845 blocks
    

    I suspect that's counting from 0 and it looks like it's gone wrong on the last block of the CLRI bitmap - which (my guess) is dump is writing but restore is not expecting.

    I'm more than a bit disappointed that restore didn't resync on the next block, I need to have a dig into that - which is probably where the fix will lie (as well as fixing dump to not write the extra block)

     
  • Tim Woodall

    Tim Woodall - 2025-03-22

    Hmmm, now I've looked at the code I don't know why it's gone wrong.

    converthead sets blksread to 0 which means
    Incorrect block for <file removal list> at 1845 blocks
    is reading what should be the TS_BITS header which means that something went wrong with gethead at line 1316 of restore/tape.c

    It's clearly read 1844 blocks at line 1286

    Ohhh, it's gone wrong at line 1305

     
  • Tim Woodall

    Tim Woodall - 2025-03-22

    Incorrect block for <file removal list> at 1845 blocks has read one too many blocks because it starts at 0 so it's skipped over the TS_BITS and then fails when it gets to the TS_INODE

     
  • Tim Woodall

    Tim Woodall - 2025-03-22

    I'm pretty sure you are reaching line 1305 but I cannot see how that can happen.
    You get to 1299 with size==0 and b==0 and i==1843 having read the last block of the bitmap at line 1286.

    Then that do { } while loop sets enclen to 1 so readtape(junk); should never be reached.

    You could try this fix but I'm suspicious it's wrong where we've read a file of say 1K on a system with blocks of 4K although interestingly, all the "non slow tests" have passed, I'm running the historical-regression tests while I write this.

    It would be interesting if you could add some logging before each of the readtape calls in this function outputting the line number the value of i, b and blksread. That should only output 3K lines of trace before it fails. I'd be particularly interested in what it says just before that first "Incorrect block for" message.

    diff --git a/restore/tape.c b/restore/tape.c
    index dd20422c..0920b966 100644
    --- a/restore/tape.c
    +++ b/restore/tape.c
    @@ -1296,7 +1296,7 @@ loop:
                                    pos += sbytes;
                                    last_write_was_hole = 1;
                            }
    
    -                       if ((size -= TP_BSIZE) <= 0) {
    +                       if ((size -= TP_BSIZE) < 0) {
                                    ++b;
                                    do {
                                            enclen = readmapflag?1:get_s_addr_length(spcl.c_addr + i);
    

    Actually, I think
    Incorrect block for <file removal list> at 1845 blocks is correct, it's read then 1845th block after the start of the bitmap - which should be the TS_BITS but somehow gethead doesn't like it.

    Could you add trace to converthead too so we can know which particular FAIL is being triggered. I think it has to be the NFS_MAGIC case because a checksum failure should have logged.

    This is very puzzling!

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.

MongoDB Logo MongoDB