Menu

Help understanding Fix logic (find similar block before parity)?

Help
xad
2015-05-02
2015-05-05
  • xad

    xad - 2015-05-02

    Hi,

    Performing some tests I have a question regarding the fix process.
    In the "repair" function is it so that "try to fetch the block using the known hash" is done before using the parities (related to another discussion related non-checksum-protected parity).
    A valid hash can not guarantee the data correctness, but an invalid hash can guarantee invalid data. Shouldn't then the parity/ies be used first and then the result validated with the hash?
    If I'm not missing something obvious (which is quite possible and why I ask) shouldn't this dedup-hash-based-recovery method be optional or possible to deactivate, especially since it is silent unless the log is used? The file-stamp should make it quite rare, so it is a great option though.

    /X

    PS! Do not read this as critics not to use this product. I wouldn't spend time to understand it if I didn't find it the best software for my needs. (Again thanks amadvance for all your work.)

     
  • Andrea Mazzoleni

    Hi xad,

    Yes. SnapRAID always makes the assumption that if the hash matches, the data is correct. This is someway an intrinsic limitation of SnapRAID, that extends also to parity computation, as the result of parity recovering is also validated with the hash.

    Essentially, even changing to what you propose, SnapRAID will still rely on hashes to validate data. So, you'll have the guarantee that the hash matches, but not necessarily also the data.

    Anyway, with 128 bit hash, I would not really worry about it ;)

    Ciao,
    Andrea

     
  • xad

    xad - 2015-05-03

    Hi,

    I agree that the 128 bit hash is safe to protect from bit-flips and other noice causing corruptions, ie "to prevent unintentional changes to information". But in this case it is used to uniquely identify a block. At home protecting multimedia files and such it is of only academic value but I get a feeling your software is also used in some business cases.

    By determine bad parity blocks by a hash and then apply your repair logic using parity first should increase the integrity even higher, and even be resiliant against tailor-made murmur3 collission files (very rarely needed but a proof of the overall high quality).
    Then, if repair is not possible, apply your smart option using "search same hash" and the use of all parities ignoring parity-checksum failures. If this is logged with "warning" the user can then take extra actions if needed to ensure any possible (even if very remote) incorrect files are overwriting their good-backups. (People having remote backups are more "paranoid" than regular people and they might be happy with the additional option to restore from the backup instead.)

    As you so kindly published the code I know I can always do it myself, but it would then only benefit me and I also trust your coding much more.

    /X

    PS! If anyone else would like to have this on the "should-have", or "nice-to-have" list please post, otherwise I know this is on the "no-need" list.

     
    • Leifi Plomeros

      Leifi Plomeros - 2015-05-03

      I'm trying to understand the problem but I don't think that I do.

      Let's say that you have 10 data disks and 2 parity.

      One of the data disks is lost...

      You run snapraid fix...

      Snapraid uses blocks from 9 datadisks + 1 parity disk to figure out the contents of each missing block.
      Before using the individual blocks it also checks that the hash is correct for each every one of them...

      But...

      There is still risk that an evil user has accessed your system and silently replaced the contents of one of the data blocks (or parity block) with a tailor-made counterfit with same hash as the original, thereby fooling snapraid into recreating an incorrect data block?

       
  • xad

    xad - 2015-05-03

    If I understood it correctly:

    You run snapraid fix...

    Snapraid uses blocks from 9 datadisks and if needed the 2 parity disks to figure out the contents of each missing block.
    Before using the individual blocks it also checks that the hash is correct for each every one of them...

    SnapRAID will after the block validation try to find a data block with same hash from the existing valid datablocks from any of the disks.
    If found the data will be copied.
    If not found the parities will be used to recreate the bad datablock (until the resulting datablock hash match).

    Then the parities will be checked and if not matching they will be recalculated and updated.

    I.e. the parities are not used if hash-matching data block exist. Only enough data and parity is used in recovery (which is OK as the parity is not hash-protected).

    You don't have to be evil, but hash-collission is about probability and statistics. But eventhough the chance to win 45M$ lottery is slim, there are winners. And nowdays when people are using compressed and encrypted files I would assume the data distribution to be much broader than text only but still for snapraid to fail you would both have to be the increadable lucky "winner" to have a hash-collission and at the same time an error in one of those blocks.

    edit If this is for improving performane, again it would be nice to have it as the "nocopy"-option for the people who prefer not to trust probability but can go with the lower perfomance (if no duplicated files are stored then it should be no performance benefits only added risk, eventhough it is practically nil).

    /X

     

    Last edit: xad 2015-05-03
    • Leifi Plomeros

      Leifi Plomeros - 2015-05-03

      If you convert the content file to text you will find entries similar like this for each file:

      file D1 14583352497 1420426430.996106500 562949953421663 Folder/FilenameA..blk 13620 2d8a4204501a9262861fa261997ae944..blk 13621 ce7eb1e30aba2c17081662132849b84a..blk 13622 75047d3cf4849fe53561080ddce158d8..blk 13623 ...

      These block numbers can occur again on other data disks.

      I have always assumed that those block numbers are used to create virtual "RAID stripes".

      So I don't see why SnapRaid would use the hash as reference and grab just any hash matching block and assume it to be correct.

      It would seem much more logical for it to first at least try to grab a very specific block on a specific disk inside a specific file and then to verify that block against the hash.

      But I haven't looked on the relevant source code, so maybe you or Andrea can correct me if I have it all backwards.

       
      • xad

        xad - 2015-05-03

        I think you confuse block number with block hash.
        The 16 bytes (16 hexchars) after the blocknumber is the hash, ex. "2d8a4204 501a9262 861fa261 997ae944" for block 13620.

        /X

         
        • Leifi Plomeros

          Leifi Plomeros - 2015-05-03

          No I don't :)

          I see it as "FileTag, DiskName, Some numbers that are probably last modified and file size, FileLocationAndName, BlockIndexNr, Hash, BlockIndexNr, Hash, BlockIndexNr, Hash, ...

          BlockIndexNr is reused on different data disks but never on same disk, which makes me assume that blocks are referenced as DiskName\BlockNumber and validated using the hash.

           
      • Jessie Taylor

        Jessie Taylor - 2015-05-03

        Right. SnapRaid does not do what xad suggests.

        All SR does during fixing of a data drive is read data blocks from remaining data drives (and verify their integrity with hashes), read corresponding parity block(s), compute the missing data block, then finally check the hash of the restored data block. If the hash of the restored data block does not match (most likely reason is that there was a silent error in parity), then if there is additional parity available (eg., if you had 2-parity but only 1 missing block), then SR will try to compute the missing data block using different parity, and then again check to see if the restored block matches the hash.

        The chance of a hash collision is probably around 2^-128 (the Spooky Hash developer has only been able to verify it to 2^-73 due to hardware limitations, but it is likely better than that). Even if the chance of a collision is only 2^-73 = 1.06 x 10^-22, that is still extremely unlikely. Comparing it to the chances of winning the lottery is not apt, since the chances of winning the lottery are about 5 x 10^-9. In the worst case, the chances of a collision are about a MILLION times LESS likely than winning the lottery TWICE IN A ROW. More likely, the chance of a collision is around 2^-128 = 3 x 10^-39, which is less likely than winning the lottery 4 times in a row.

        A more useful way to look at it is that you would need to hash more than 2 x 10^19 blocks with a 128-bit hash before the chance of a hash collision is more than 50%. With 256 KiB blocks, that is more than 5 x 10^24 bytes, which is more than 5 million million Terabytes. If that data were stored on 5TB HDDs, each weighing a pound, that would be a 500 million tons of HDDs.

        Think of the poor people who have to install all those HDDs. If they can install 1 HDD every 5 seconds (that is fast!), it would take 30,000 people working 80 hour weeks for 10 years to install them.

        But that is nothing compared to manufacturing those HDDs. If the entire $10 trillion economy of China were converted so that the entire country does nothing but manufacture 5TB HDDs, it would take 10 years for all of China to make that many HDDs.

         

        Last edit: Jessie Taylor 2015-05-03
        • xad

          xad - 2015-05-03

          Jessie,

          Unless you then can enlighten me what following code in "check.c" is doing I start to feel this is a pissing contest which is not why I post. (I will then just mod my copy and stop here.)

          "repair"

              n = 0;
              something_to_recover = 0; /* keep track if there is at least one block to fix */
              for (j = 0; j < failed_count; ++j) {
                  if (failed[j].is_bad) {
                      unsigned block_state = block_state_get(failed[j].block);
          
                      assert(block_state != BLOCK_STATE_DELETED); /* we cannot have bad DELETED blocks */
          
                      /* if we have the hash for it */
                      if ((block_state == BLOCK_STATE_BLK || block_state == BLOCK_STATE_REP)
                              /* try to fetch the block using the known hash */
                          && (state_import_fetch(state, rehash, failed[j].block, buffer[failed[j].index]) == 0
                              || state_search_fetch(state, rehash, failed[j].block, buffer[failed[j].index]) == 0)
                      ) {
                          /* we already have corrected it! */
                          log_tag("hash_import: Fixed entry %u\n", j);
                      } else {
                          /* otherwise try to recover it */
                          failed_map[n] = j;
                          ++n;
          
                          /* we have something to try to recover */
                          something_to_recover = 1;
                      }
                  }
              }
          
              /* if nothing to fix */
              if (!something_to_recover) {
                  log_tag("recover_sync:%u:%u: Skipped for already recovered\n", pos, n);
          
                  /* recompute only the parity */
                  raid_gen(diskmax, state->level, state->block_size, buffer);
                  return 0;
              }
          
              ret = repair_step(state, rehash, pos, diskmax, failed, failed_map, n, buffer, buffer_recov, buffer_zero);
          
           
  • Andrea Mazzoleni

    Hi,

    Please let me clarify this issue.

    What xad is saying is correct. Before recovering, SnapRAID searches for blocks using the hash.

    More specifically it searches for file with the same size and timestamp of the file to be recovered. If the hash of blocks also match, the blocks are used directly.

    This is done to allow to recover when you move files from one disk to another. Note that here we are comparing three different and independent values: hash, size and timestamp. This means about 192 bits of data that have to unluckily match. A probability of 10^-58. Something so small, that it's not even conceivable ;)

    For comparison, the probability that a cosmic ray may change a specific byte of memory every second is about 10^-15. Enormous bigger.

    http://stackoverflow.com/questions/2580933/cosmic-rays-what-is-the-probability-they-will-affect-a-program

    Ciao,
    Andrea

     
    • Jessie Taylor

      Jessie Taylor - 2015-05-03

      Actually, what xad wrote was:

      SnapRAID will after the block validation try to find a data block with same hash from the existing valid datablocks from any of the disks.

      But that is not actually true, is it? SR will only search for other data blocks with the same hash if it FIRST finds a file with the same size and timestamp as the one it is trying to recover, as you just explained.

      SR is not routinely searching for blocks with the same hash during a fix, which is what xad seemed to be suggesting.

       
  • xad

    xad - 2015-05-04

    Andrea,

    Thanks, the "file with the same size and timestamp" was the missing part in my original question.
    The first hash test done is "state_import_fetch", which is actually testing any block regardless of file (if I understand it right). What I missed was that it is only loaded if you specify the "test-import-content" option (using this option and an import-file with blocks swapped I got the expected result). Based on the probability argument, I will use this method as one of the "last" resort to recover a lost block or when to "infuse" a good block from a backup with different timestamp if Murphy wins over my parities).

    /X

    PS! Thanks again, not only for all obvious functionalities, but also for all the "hidden" capabilities.

    edit (Test was done in v7.1, but should be same for v8.0 AFAIK.)

     

    Last edit: xad 2015-05-04
    • Andrea Mazzoleni

      Hi xad,

      Yes, you are correct. The state_import_fetch() is used only if the undocumented option --test-import-content is specified. It's just a testing feature, not used normally.

      Ciao,
      Andrea

       

Log in to post a comment.

MongoDB Logo MongoDB