Activity for dd_rescue

  • Kurt Garloff Kurt Garloff modified ticket #10

    static fails: missing $(OBJECTS2)

  • Kurt Garloff Kurt Garloff posted a comment on ticket #10

    Clsoed fixed.

  • Kurt Garloff Kurt Garloff posted a comment on ticket #10

    Ah, you can tell that the static build is not part of my test routine. I took your patch and applied it to git, see commit 0e68146. Thanks! Let me know if you find anything else.

  • Kurt Garloff Kurt Garloff committed [0e6814] on Code

    Makefile fix: static target missed some objects.

  • Nico P Nico P created ticket #10

    static fails: missing $(OBJECTS2)

  • dd_rescue dd_rescue updated /README.sparse

  • dd_rescue dd_rescue updated /README.dd_rescue

  • dd_rescue dd_rescue released /dd_rescue-1.99.22.tar.bz2.asc

  • dd_rescue dd_rescue released /dd_rescue-1.99.22.tar.bz2

  • Kurt Garloff Kurt Garloff committed [84d253] on Code

    Remove extra -g root

  • Kurt Garloff Kurt Garloff committed [8d142c] on Code

    Use local -march setting

  • Kurt Garloff Kurt Garloff committed [1c48fd] on Code

    Avoid raising a second SIGILL.

  • Kurt Garloff Kurt Garloff committed [405962] on Code

    Handle compilers that don't support arm-v8.5-a+rng.

  • Kurt Garloff Kurt Garloff committed [868889] on Code

    Add rdrand Makefile target.

  • Kurt Garloff Kurt Garloff committed [6d431f] on Code

    Don't BSWAP32 on aarch64 ASLR init.

  • Kurt Garloff Kurt Garloff committed [754ed1] on Code

    Update documentation.

  • Kurt Garloff Kurt Garloff committed [ec0633] on Code

    Bump version to 1.99.22.

  • Kurt Garloff Kurt Garloff committed [9fd0c4] on Code

    Use CPU rng to initialize PRNG in aarch64.

  • Kurt Garloff Kurt Garloff committed [d19476] on Code

    Detect avail of aarch-v8.5-a rng rndr for random numbers.

  • Kurt Garloff Kurt Garloff committed [af959b] on Code

    Fix rdrand elif clause syntax

  • Kurt Garloff Kurt Garloff committed [373f2f] on Code

    Use SCHED_YIELD macro.

  • Kurt Garloff Kurt Garloff committed [c785c0] on Code

    Autodetect SRCDIR and add build insns.

  • Kurt Garloff Kurt Garloff committed [59fa7e] on Code

    Link addtl test scripts to build dir.

  • Kurt Garloff Kurt Garloff committed [1ce470] on Code

    Use rdrand64 on x86-64 rather than rdrand32.

  • Kurt Garloff Kurt Garloff committed [9d692d] on Code

    clang found this for me ...

  • Kurt Garloff Kurt Garloff committed [2b11c1] on Code

    Use -@ and short option for --sparse_nonslow

  • Kurt Garloff Kurt Garloff committed [0fce06] on Code

    Option --spares_nonslow=maxreadtime: Do write 0s.

  • Kurt Garloff Kurt Garloff committed [26b7b4] on Code

    More precise comment on the floating averages.

  • Kurt Garloff Kurt Garloff committed [317fe4] on Code

    Add explanation of code and floating averages.

  • Kurt Garloff Kurt Garloff committed [e773bd] on Code

    Better startup behaviro for currrate and avgrate.

  • Kurt Garloff Kurt Garloff committed [3e0bc8] on Code

    Use harmonic mean to calc current speed.

  • Kurt Garloff Kurt Garloff modified ticket #9

    test_sparse.sh sporadic failures

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    Closing

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    Great, thanks!

  • Sam James Sam James posted a comment on ticket #9

    Thanks a lot for your patience & work on this. I can't reproduce it anymore. I think we're good to close!

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    So I believe that your setup did produce à significant amount of interrupted or short writes. (This may be specific to filesystem code or CPU or memory utilisation). dd_rescue had code to handle it, but it had issues. These have been addressed and I have tests for it now. I'm fairly optimistic that we are in good shape here. Let me know if you have different experience, otherwise I'd like to close this ticket. Thanks for your tenacity, helping to harden the code!

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    Hi Sam, Just inject the key found at https://github.com/garloff.keys If the bug still reproduces ... -- Kurt On 14.03.25 16:13, Sam James wrote: OK, @garloff https://sourceforge.net/u/garloff/profile/, a friend's setup a VM where they can reproduce it easily and made it available for SSH. Where should I email the credentials to? [tickets:#9] https://sourceforge.net/p/ddrescue/tickets/9/ test_sparse.sh sporadic failures Status: open Milestone: 1.0 Created: Fri Feb 14, 2025 02:41 AM UTC by Sam James...

  • Kurt Garloff Kurt Garloff committed [b1f234] on Code

    Better language on -H percent.

  • Sam James Sam James posted a comment on ticket #9

    Thank you! I won't declare victory yet but it's looking promising.

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    I released 1.99.21 with all the fixes included. Let me know if you find further issues or if this one is not yet completely solved for you.

  • Kurt Garloff Kurt Garloff committed [668adb] on Code

    Avoid definint READ and WRITE. Used on Android.

  • Kurt Garloff Kurt Garloff committed [e36359] on Code

    Merge branch 'DD_RESCUE_1_99_BRANCH' of ssh://git.code.sf.net/p/ddrescue/code into DD_RESCUE_1_99_BRANCH

  • Kurt Garloff Kurt Garloff committed [840295] on Code

    Fix SRCDIR handling when creating .dep.

  • Kurt Garloff Kurt Garloff committed [156738] on Code

    Merge branch 'DD_RESCUE_1_99_BRANCH' of ssh://git.code.sf.net/p/ddrescue/code into DD_RESCUE_1_99_BRANCH

  • Kurt Garloff Kurt Garloff committed [b20d62] on Code

    salt is actually not very sensitive, don't warn.

  • Kurt Garloff Kurt Garloff committed [b446f8] on Code

    Avoid using C23 festure.

  • Kurt Garloff Kurt Garloff modified a comment on ticket #9

    The code on the git branch DD_RESCUE_1_99_BRANCH now should have impeccable handling for interrupted and short IO calls. Testing welcome!

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    The code on the git branch DD_RESCUE_1_99_BRANCH now should have impeccable handling or interrupted and short IO calls. Testing welcome!

  • Kurt Garloff Kurt Garloff committed [e24440] on Code

    Clear errno if it's EAGAIN or EINTR.

  • Kurt Garloff Kurt Garloff committed [d6d6bd] on Code

    Testing and fixing short and interrupted read/write calls.

  • Sam James Sam James posted a comment on ticket #9

    OK, @garloff, a friend's setup a VM where they can reproduce it easily and made it available for SSH. Where should I email the credentials to?

  • Sam James Sam James modified a comment on ticket #9

    Ah, sorry, I thought I had but I can't find it indeed. This fails for me: wget https://www.garloff.de/kurt/linux/ddrescue/dd_rescue-1.99.20.tar.bz2 tar xvf dd_rescue-1.99.20.tar.bz2 cd dd_rescue-1.99.20 make -j all make check while true; do ./test_sparse.sh "-L ./libddr_crypt.so=AES192-CTR:weakrnd:pbkdf2:pass=ABC:" "encrypt" "decrypt" 8388612 || break; done ... but I haven't reproduced it more than once so far this morning. Bad (good) luck, I guess. A friend who can reproduce it is also going to...

  • Sam James Sam James posted a comment on ticket #9

    Ah, sorry, I thought I had but I can't find it indeed. This fails for me: wget https://www.garloff.de/kurt/linux/ddrescue/dd_rescue-1.99.20.tar.bz2 tar xvf dd_rescue-1.99.20.tar.bz2 cd dd_rescue-1.99.20 make -j all make check while true; do ./test_sparse.sh "-L ./libddr_crypt.so=AES192-CTR:weakrnd:pbkdf2:pass=ABC:" "encrypt" "decrypt" 8388612 || break; done ... but I haven't reproduced it more than one so far this morning. Bad (good) luck, I guess. A friend who can reproduce it is also going to setup...

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    Note that the final patch will look slightly different in my current view. I will just turn the errno == 0 in the while statement into (errno == 0 || errno == EAGAIN || errno == EINTR). We do not handle short writes and interrupted writes in dd_rescue currently. That will be fixed also (and may be the reason for the issues you observe).

  • Kurt Garloff Kurt Garloff modified a comment on ticket #9

    OK, so here's what I have been able to find: readblock() was actually not safe against a EINTR/EAGAIN followed by short read. You can add errno = 0; before the mypread() call in readblock() in dd_rescue.c:1827 and let me know if this fixes things for you.

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    OK, so here's what I have been able to find: readblock() was actually not safe against a EINTR/EAGAIN followed by short read. You can add errno = 0; before the mypred() call in readblock() in dd_rescue.c:1827 and let me know if this fixes things for you.

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    OK, care to give complete instructions for the reproduction? git checkout DD_RESCUE_1_99_BRANCH ./autogen.sh make -j all make check while true; do ./test_sparse.sh "-L ./libddr_crypt.so=AES192-CTR:weakrnd:pbkdf2:pass=ABC:" "encrypt" "decrypt" 8388612 || break; done is what I do without any success. Somehow I need to get this reproduced!

  • Sam James Sam James posted a comment on ticket #9

    It looks like strace can do some fault injection too. Maybe I'll try that?

  • Sam James Sam James posted a comment on ticket #9

    Here's lscpu from two machines where I can hit it. (Note this machine currently is booted with mitigations=off and I've found this has affected timing-related bugs before): $ uname -a Linux mop 6.13.6 #1 SMP PREEMPT_DYNAMIC Sat Mar 8 14:02:16 GMT 2025 x86_64 AMD Ryzen 9 3950X 16-Core Processor AuthenticAMD GNU/Linux $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 43 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Vendor ID:...

  • Sam James Sam James posted a comment on ticket #9

    Just regular configure and make (building from tarball, no package manager; I originally reproduced it in our PM, but I wanted to rule all of that out) with nothing exported in the environment or passed to configure or make It takes a few minutes at most

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    OK, creating a signal handler that only logs the signal for SIGWINCH and sending lots ot SIGWINCH does not seem to cause any trouble, so I may have to go the IO injection route.

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    Thinking on this some more ... In my mind, the most likely theory is that I fail to do the right thing when IO irregularities happen. IO calls can be interrupted (-EINTR) or may return with only having done part of the work (short reads/writes). We have logic to handle this, but the logic may have bugs. Such bugs tend to go unnoticed, as interrupted and short IO happens very rarely. I will look at the code with that focus again. If I don't see anything suspicious, I will create a wrapper that injects...

  • Kurt Garloff Kurt Garloff committed [d28da7] on Code

    Better help for test_aes.

  • Kurt Garloff Kurt Garloff modified a comment on ticket #9

    Nothing within four hours. Testing here is on a Zen3 CPU which supports VAES. The codepath is different if the CPU does support AES only or no crypto extensions. On what CPUs with what crypto capabilities can you reproduce the issue? (Note: If this does occur occasionally, there is also a possibility that it's kernel related, not handling all the extended vector registers correctly on a context switch -- though it's admittedly unlikely that this would go unnoticed ...)

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    Nothing within four hours. Testing here is on a Zen3 CPU which supports VAES. The codepath is different if the CPU does support AES only or no crypto extensions. An what CPUs with what crypto capabilities can you reproduce the issue? (Note: If this does occur occasionally, there is also a possibility that it's kernel related, not handling all the extended vector registers correctly on a context switch -- though it's admittedly unlikely that this would go unnoticed ...)

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    Running in a Gentoo VM again. The binary that you use to reproduce this: - How do you compile it? Passing any compiler flags, e.g. from the Gentoo build system? - How long does it take to reproduce the issue?

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    Stopping after 11hrs.

  • Sam James Sam James modified a comment on ticket #9

    The qcow2 should be functionally identical for our purposes here (other than some small amount of changes in stable packages in the last week), as it's a stage3 + a kernel shoved in (more or less).

  • Sam James Sam James posted a comment on ticket #9

    The qcow2 should be identical (other than some small amount of changes in stable packages in the last week), as it's a stage3 + a kernel shoved in (more or less).

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    Compiled with the -fsanitize options .... Nothing thus far (after 1hr), neither on tmpfs nor NFS.

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    No luck with 3hrs on Orange Pi (aarch64, gcc-11, on NFS). x86-64 gcc-14 on a openSUSE VM on NFS is running now since 10mins without issue. Will leave it running for a while. Will start tmpfs tests in parallel, though I would suspect tmpfs to less likely fail ... On Gentoo stage3 tarball: Is that significantly different from the qcow2 I downloaded a week ago?

  • Sam James Sam James modified a comment on ticket #9

    Good news: ASAN is now happy. UBSAN still has some alignment issues but I haven't looked into those. Bad news: I can still reproduce the assert (or failed file comparison, depending on luck). Assert: ./dd_rescue -a -b 16k -L ./libddr_crypt.so=AES192-CTR:weakrnd:pbkdf2:pass=ABC:encrypt testfile testfile.copy2 dd_rescue: (info): Using softbs=16.0kiB, hardbs=4.0kiB dd_rescue: (warning): crypt (-1): Don't specify sensitive data on the command line! dd_rescue: (info): expect to copy 8192.0kiB from testfile...

  • Sam James Sam James modified a comment on ticket #9

    I'd asked a friend to try reproduce, both to help get to the bottom of it, and also make sure I'm not wasting your time somehow -- he couldn't reproduce at first, but then pulled a fresh Gentoo stage3 tarball, chrooted in (just extracted to some temporary location), and could pretty quickly in a loop. I'm still trying to think of ideas, but could you try the loop on a tmpfs mount?

  • Sam James Sam James posted a comment on ticket #9

    I'd asked a friend to try reproduce, both to help get to the bottom of it, and also make sure I'm not wasting your time somehow -- he couldn't reproduce at first, but then pulled a fresh Gentoo stage3 tarball, chrooted in (just extracted to some temporary location), and could pretty quickly in a loop. I'm still trying to think of ideas, but could you try the loop on a tmpfs partition?

  • Sam James Sam James posted a comment on ticket #9

    Good news: ASAN is now happy. UBSAN still has some alignment issues but I haven't looked into those. Bad news: I can still reproduce the assert (or failed file comparison, depending on luck). Assert: ./dd_rescue -a -b 16k -L ./libddr_crypt.so=AES192-CTR:weakrnd:pbkdf2:pass=ABC:encrypt testfile testfile.copy2 dd_rescue: (info): Using softbs=16.0kiB, hardbs=4.0kiB dd_rescue: (warning): crypt (-1): Don't specify sensitive data on the command line! dd_rescue: (info): expect to copy 8192.0kiB from testfile...

  • Sam James Sam James posted a comment on ticket #9

    Nice! I'll pull that in to our packaging now and re-test with it.

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    Notes: 1. I found no further places in the code where I may have made the same mistake. 2. The issues I saw on OrangePi5B before which I attributed to NFS are fixed by this. I have a theory here: NFS causes short (incomplete) writes with some likelihood, while many local filesystems are unlikely to do so. So that's how this got triggered. (You needed a second retry to cause stack corruption.) I plan to release 1.99.21 in a few days, so we have an official release soon with this fixed. For the time...

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    I pushed this fix to DD_RESCUE_1_99_BRANCH. I'm still looking at a few unaligned warnings that the sanitizer uncovered. At first look, these are hard to avoid, as lzo is not designed to guarantee any alignment (larger than one byte) of compressed content.

  • Kurt Garloff Kurt Garloff committed [6d011c] on Code

    When countring retries, we would have inc a pointer.

  • Kurt Garloff Kurt Garloff committed [9cbb71] on Code

    Pass CFLAGS also to gcc linker call.

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    I can reproduce the -fsanitizer error. Did I really get the precendence rules of C wrong? Kind of embarassing after 30yrs ... Change !*retry++ to !(*retry)++ at the two places in real_writeblock() and try again ...

  • Sam James Sam James posted a comment on ticket #9

    On one of the runs when it differs: $ diffoscope testfile testfile.copy --- testfile +++ testfile.copy @@ -524282,8 +524282,8 @@ 007fff90: 8257 553a b086 0b31 c88c 558f 5400 71bf .WU:...1..U.T.q. 007fffa0: 983e 49c9 74f1 8220 5777 b11b 119f 9000 .>I.t.. Ww...... 007fffb0: 1aed a523 8120 ab20 c94a 4e9c e0d7 aab8 ...#. . .JN..... 007fffc0: c5f1 945d 399d 0fd2 1e28 6106 e09d a777 ...]9....(a....w 007fffd0: c6bd 6382 b708 4633 c526 90c7 3443 5e7f ..c...F3.&..4C^. 007fffe0: 215c 5e10 abe8 dc1a 7be0 ae61...

  • Sam James Sam James posted a comment on ticket #9

    (Sorry for delay, I've been working on some other bits this week.) Don't stress over it more for now and I'll try to reproduce in a VM. I'm sorry about the mystery :(

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    Compiling in and running on a Gentoo 2.17 VM (kernel 6.6.47, gcc-14.2.1) on x86-64 (Zen3) for several hours did not yield any error. local-19353 ~/dd_rescue # ./dd_rescue --version dd_rescue Version 1.99.20, kurt@garloff.de, GNU GPL v2/v3 (DD_RESCUE_1_99_20-2-g1dd2a7a) (compiled Mar 1 2025 16:43:33 by gcc (Gentoo 14.2.1_p20241221 p7) 14.2.1 20241221) (features: O_DIRECT dl/libfallocate fallocate splice fitrim xattr rdrnd sha vaes avx2) dd_rescue is free software. It's protected by the terms of GNU...

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    OK, gentoo .qcow2 for cloud-init downloaded from https://www.gentoo.org/downloads/

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    Are there .qcow2 images for Gentoo available for download somewhere? Maybe things reproduce in a Gentoo VM ...

  • Sam James Sam James posted a comment on ticket #9

    Bizarre. Is there anything I can do to get more information other than figuring out environments it does, and doesn't, happen in? Happy to run with custom patches or build with whatever options.

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    Trying reproduction in many loops on many devices. Nothing. Except for one finding, which I suspect not to be a dd_rescue bug: On an ARM64 SBC (Orange Pi5B, kernel 5.10.160), I see typically two corrupted bytes (3 bytes apart or so) after running for a minute or so. This only happens when testing on NFS, not when writing to the local filesystem. Other NFS clients (x86-64, kernel 612.x mostly) do not show this behavior. Nor do I see it on the OrangePi5 when using a local filesystem. Weird.

  • Kurt Garloff Kurt Garloff committed [1dd2a7] on Code

    More specific warnings for passed secrets.

  • Kurt Garloff Kurt Garloff committed [d9af07] on Code

    Better support for different SRCDIR.

  • Sam James Sam James posted a comment on ticket #9

    Thanks for taking a look Kurt. I'd actually started off assuming it was either a GCC bug or at least specific to GCC 15, but then managed to reproduce with GCC 13 too and wrote off the fact I hadn't hit it before as related to how I'm just not guaranteed to hit it every time (i.e. I assumed it's not a new issue). Let me try on a few other machines and environments and get back to you. I'll first try on my other Gentoo machines then try some other distros in Docker or a chroot (I'm a Gentoo developer...

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    Hmm, tested on two machines, (both x86-64, AMD Zen 3 and Zen 4). Ubuntu 24.04 (gcc-13.3) bare metal and openSUSE-15.6 (with a self-compiled gcc-15) in a VM. Running this for ~20mins on either machine did not yield any error ... Any hints on what may be special in your setup? Can you reproduce this on several different setups (distributions, CPUs, compilers, ... ?)

  • Kurt Garloff Kurt Garloff posted a comment on ticket #9

    Thanks for the report! It looks like it hits sporadically when using 16k blocks both during de- and encryption. Interestingly not even a 2nd plugin (like de/compression) seems to be involved, which I would have suspected, as that complicates things and I had a bit of work b/f releasing 1.99.20 to get all corner cases right there. I'll let you know what I can find.

  • Sam James Sam James posted a comment on ticket #9

    Sorry, I can't seem to edit the first post to fix the formatting. It usually takes 30s or so for the loop running on a fast machine (AMD Ryzen 3950X) to hit a failure, sometimes longer (1m, maybe up to 3m). Another example of a failure is: ./dd_rescue -a -b 16k -L ./libddr_crypt.so=AES192-CTR:weakrnd:pbkdf2:pass=ABC:decrypt testfile.copy2 testfile.copy dd_rescue: (info): Using softbs=16.0kiB, hardbs=4.0kiB dd_rescue: (warning): crypt (-1): Don't specify sensitive data on the command line! dd_rescue:...

  • Sam James Sam James created ticket #9

    test_sparse.sh sporadic failures

  • dd_rescue dd_rescue released /dd_rescue-1.99.20.tar.bz2.asc

  • dd_rescue dd_rescue released /dd_rescue-1.99.20.tar.bz2

  • dd_rescue dd_rescue updated /README.dd_rescue

  • dd_rescue dd_rescue released /README.sparse

  • Kurt Garloff Kurt Garloff committed [805725] on Code

    Fix memory corruption (!).

1 >
MongoDB Logo MongoDB