Menu

Decrease RAM usage with separate hash file?

Help
2016-07-25
2016-07-29
  • Leifi Plomeros

    Leifi Plomeros - 2016-07-25

    Most likely this has already been considered and rejected...

    But wouldn't it make sense to separate all data block hashes into a separate hash file that can be read on demand similar to how parity files work?

    In theory this would eliminate the need to keep hashes in RAM which would then make SnapRaids RAM requirement very tiny.

    In the least complicated form it would be a matrix with fixed locations for every hash in stripes like this:
    Byte 0-15: hash for d1 block 0
    Byte 16-31: hash for d2 block 0
    Byte 32-47: hash for d3 block 0
    Byte 48-63: hash for d1 block 1
    Byte 64-79: hash for d2 block 1
    ...

    During fix, scrub and check it would be very simple to predict which checksums you need and collect them in a single disk read for every stripe of blocks, either in parallell or in advance while working.

    Sync could build a new hash file in parallell to the old file and collect all necessary hashes from the old hash file.

    In the end when it is complete the hash file could be appended (basically stored) in a separate section at the end of the content file to make it cleaner and less files to keep track of for the end-user.

    The obvious downside would be time to implement, lost backwards compatibility and probably a lot of head scratching how to optimize so it doesn't have to be a complete matrix with tons of dead space for less used disks and still possible to pinpoint exactly where to find specific hashes.

    Additionally it might be a good idea to compliment with a hash for every stripe of hashes to avoid the risk of corruption of the hashes.

    Personally I have a very limited need for this since I am quite happy with the balance of block size and RAM in my setup. So it is only an idea how to improve in general in case you see a need for it.

     
  • Andrea Mazzoleni

    Hi Leifi,

    Yes. Storing the hashes in that order would allow a fast access, as that hash file would be accessed sequentially in most operations.

    I suppose that if such low memory support is ever implemented, it's likely to be something like that. Maybe with two files, a ".content" for the other information and a ".hash" only for hashes.

    The issue is that if enouh memory is available the present implementation is going to be faster anyway, so there is only a small incentive to do so.

    Ciao,
    Andrea

     
  • CybrSage

    CybrSage - 2016-07-29

    SnapRAID uses so little memory as it is, I prefer faster. Heck, I would welcome a switch or something that says it can use a LOT more memory if available to make it even faster...but I doubt something like that can be done (or would make any difference)

    I have 16GB RAM, so plenty to play with.

     

Log in to post a comment.

MongoDB Logo MongoDB