Menu

Snapraid on new HW - NTFS formatting suggestion needed

Help
Ilgrank
2015-01-12
2015-02-01
  • Ilgrank

    Ilgrank - 2015-01-12

    I'm planning a disk upgrade, but since I've fallen several times in the problem of running out of space on the parity disk (even tho my parity disk is as big as the others) I'd want to try to be on the safe side this time.
    On EXT2/3 making sectors bigger should help (more fragmentation, but less overhead on parity file) but what about NTFS?
    Thanks!

     
  • John

    John - 2015-01-12

    On ext filesystems the big waste (apart from the trivial 5% reserved for root) is the space used by WAY too many inodes by default, at least for disks in TB range storing media files (the waste is around 1.6%).

    I don't know much about optimizing ntfs, however no matter the optimization if you use equal and full disks you'll most likely run out of parity space. The easy fix I would recommend is just to drop whatever data you don't want protected by parity in some folder ignored by snapraid. You just need to do this on the biggest parity drives.

     
  • Ilgrank

    Ilgrank - 2015-01-12

    Hi
    I have data storage needs mainly about static large files.
    All the drives are absolutely equal in my system, with one drive dedicated exclusively for the parity file.
    I have only data I want to protect on my drives, so excluding files/folders is not an option for me.
    I've already had several times a full parity drive despite having some 5-8% free space on the other drives, and since I'm doing a new array anyway, if there's something I can do to ease the problem I would want to engineer it before putting date in the new array..

    Thanks!

     

    Last edit: Ilgrank 2015-01-12
  • John

    John - 2015-01-12

    5-8% sounds a bit much for large files.

    What's the maximum number of files you have on one data disk (and for that matter what's the size of the disk)?

     
  • Leifi Plomeros

    Leifi Plomeros - 2015-01-12
    1. Create a hidden folder on each data disk.
    2. Edit snapraid.conf and select to ignore hidden folders (or explicitly exclude the name of the hidden folder)
    3. For each data disk: Put 1 GiB junk data in the hiddden folder for each 4.000 files that you plan to store on that disk.

    This way can never run out of parity space. Later when data disk are completely full you can fine tune by removing some of the junk and check how much free space is reported by Snapraid status command.

    This is assuming that you use default block size of 256 KiB. And after fine tuning you will most likely end up with 1 GiB junk-files for each ~8.000 data files. (Possibly even some more if you format parity with largefile4 or some other way make more space available on the parity disk)

     
    • Quaraxkad

      Quaraxkad - 2015-01-13

      It'd be much "cleaner" to resize the data drive partitions, and just drop them each by a few GB as needed.

       
  • John

    John - 2015-01-13

    This will be clean only if you have already ALL the data you want to put on the disks and you don't want to change it, otherwise you don't know how much to reserve. The original complaint was that 5-8% was not even enough and that is on a 2TB (for example) drive 100-160GB (and again even that is not enough!), in my opinion more than "a few GB". And that would be on each drive if they are all equal to the parity.

    The whole exercise with the extra files was that you can easily and safely alter their size, if you know already you want to reserve 200GB you can make a 200GB file. In fact if the data doesn't change just put there only X-200GB and that's it.

    For me personally the partition option would be particularly bad because I want to use the space there and not only I don't know precisely how big the files in snapraid would be by the time the disk is almost full but also I don't know how much space I need for the files I DON'T want in snapraid so I need the flexibility on both sides. Note that you should NOT include in snapraid any file that changes unpredictably like temporary download folders, virtual machines, scratch disks for media-editing (or any kind of) programs, etc. It is not that much about not being able to recover those files themselves (or anything changed after sync was run in those files), it is about the fact that those files are actually providing parity data for all the corresponding data on all the other disks.

     
  • Leifi Plomeros

    Leifi Plomeros - 2015-01-13

    I think it has been suggested in the past but I couldn't find it as rejected or planned in the TODO-list...

    From my point of view the best solution would be if snapraid sync (and fix) command predicted the required parity size vs free space on parity disks and imediately aborted without any changes if there was not enough space available on parity disk for the operation to complete.

    It would completely remove the need for junk-folder, smaller partition or any other strategy to avoid parity size problems.

    If it is difficult to predict wouldn't it be realtively easy to simulate it instead?

     
  • John

    John - 2015-01-14

    That still wouldn't be much change compared to what we have now, the only difference would be that snapraid sync breaks before even starting, now it breaks after it runs out of space. If that somehow fails "more" and you can't recover after it then THAT would be a bug and should be fixed (I know it is hard to test all configs, we're talking about multiple OSes and filesystems, some setups with parity over network, etc).

    I think what would really help would be just to estimate "how much space you still need to reserve" on each data disk. We should be able to get how many snapraid blocks we can add to the parity (to the smallest disk if we have multiple) and also how many blocks we can add on each data disk (based on how big are the files already there on the average and how much free space there is). From this we can tell how much space we still need to reserve on that particular data disk.

     
  • Leifi Plomeros

    Leifi Plomeros - 2015-01-14

    I think it would be very much more user friendly if it aborts before you have a problem instead of after.

    User actively trying to avoid the problem and possibly failing seems much less user friendly, compared to passively discovering "Oops I need to move X GiB from disk Y before I can sync".

    An option to disable the feature would solve the possible problem of incompatible setups where free space could not be reliably calculated.

    But yes of course it does not get implemented on it's own so it is really up to Andrea to decide if he thinks it is worth the time and effort to implement or if he prefers to focus on all the other stuff already in the TODO-list.

     
  • John

    John - 2015-01-14

    It is hard to say what is "more" user friendly, never mind MUCH more - in this case it's just different.
    It still can't decide immediately if it has enough space, it still has to spin up all the data disks, walk them, etc. Is it better to discover when you come from work (like 9 hours later) that instead of putting hours of work and doing 99.5% of the work the job failed at 0% after 3 minutes of gathering data because it decided it lacks 0.5% of space? I don't know. You still need to take out some data from the data disks (which would be a "very manual" process because only you can decide what to delete) and it would be precisely the same amount (which still don't know precisely unless you implement some stat like I suggested).

    In fact estimating how much data you can still put on each data disk or the data you need to take out from the disk would be precisely the stat I'm asking for and would be very useful (either directly in that form or as subtracted from the free space). Then you can easily decide:

    • if the sync operation would fail
    • how much you can still add to the data before it fails (or how much you need to take out in case it fails to make it work)
    • if you have enough non-snapraid data so you can't make it fail even if you fill the disk (or even more detailed how much you can take out from the non-snapraid data and still be safe, or how much to add to make it safe in case it isn't already).
     
    • Mitchell Deoudes

      Maybe a flag that just spits out the space information, but doesn't
      actually do the sync? I'm thinking from the perspective of an automated
      setup, where you could poll the remaining space info frequently, but
      only run the sync sporadically. Then the script could choose to execute
      the sync or not based on available space, the size of the data to be
      processed, etc.

      On 1/14/2015 3:48 PM, John wrote:

      It is hard to say what is "more" user friendly, never mind MUCH more - in this case it's just different.
      It still can't decide immediately if it has enough space, it still has to spin up all the data disks, walk them, etc. Is it better to discover when you come from work (like 9 hours later) that instead of putting hours of work and doing 99.5% of the work the job failed at 0% after 3 minutes of gathering data because it decided it lacks 0.5% of space? I don't know. You still need to take out some data from the data disks (which would be a "very manual" process because only you can decide what to delete) and it would be precisely the same amount (which still don't know precisely unless you implement some stat like I suggested).

      In fact estimating how much data you can still put on each data disk or the data you need to take out from the disk would be precisely the stat I'm asking for and would be very useful (either directly in that form or as subtracted from the free space). Then you can easily decide:

      • if the sync operation would fail
      • how much you can still add to the data before it fails (or how much you need to take out in case it fails to make it work)
      • if you have enough non-snapraid data so you can't make it fail even if you fill the disk (or even more detailed how much you can take out from the non-snapraid data and still be safe, or how much to add to make it safe in case it isn't already).

      Snapraid on new HW - NTFS formatting suggestion needed


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/snapraid/discussion/1677233/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       
  • Andrea Mazzoleni

    Hi John,

    SnapRAID is already aborting "sync" at early stage if the parity cannot grow enough. That support requires also that the used filesystem allows to preallocate data, but that should be true for most common ones, like ext4 and NTFS.

    And if "sync" aborts for this reason, it also prints a list of the files you have to remove to make it working.

    Also, the amount of data you can still add is reported in the "status" command.

    Ciao,
    Andrea

     
  • John

    John - 2015-01-15

    I feel so stupid and I've been wondering all the time what "Wasted" might be ... is just what I wanted all the time, is also so nicely commented in status.c:

    / the maximum usable space in a disk is limited by the smaller /
    / between the disk size and the parity size /
    / the wasted space is the space that we have to leave /
    / free on the data disk, when the parity is filled up /

     
  • Ilgrank

    Ilgrank - 2015-01-15

    Hi Guys
    thanks for the in-depth discussion.
    I've 4x3Tb drives:
    2 of them filled with large files (avg 8gb each)
    1 with mixed files (more than 200k files)
    1 with parity

    Since I plan to move to 4x6Tb or 4x8Tb, I would just want to understand if there's anything I can do to ease the parity problem.
    Less small files if I understand correctly is one option (as it is storing the smaller files in a zip file or in a vhd)
    ..but if there's anything else, especially at the filesystem level I would not want to discover that once it is too late

    Agaim, thanks!

     
    • Leifi Plomeros

      Leifi Plomeros - 2015-01-15

      Place all the small files on a smaller disk.

      If you have a 6 TB parity-disk and a 4 TB data disk filled with a million small files you will "only" need at least 4.25 TB parity file to protect that disk...

      In other words, this is only an issue for data disks which are same size as parity disk.

      If hell bent on avoiding the problem all together, you can change cluster size on the data disks to 64 KiB and Snapraid block size to 64 KiB.

      The result would be:

      • 0 bytes parity waste which means you can fill all data disks to 100%
      • Waste of data disk space (No file can occupy less than 64 KiB disk space on data disk)
      • almost 4 times the normal memory requirement for SnapRaid (not a very realistic option based on the amount of data you are describing)
       
      • Ilgrank

        Ilgrank - 2015-01-21

        Thanks Leifi
        ..so if I understand correctly, using the same cluster size and snapraid block size would solve the problem? Or I'm understanding it wrong?

         
  • John

    John - 2015-01-16

    8GB average size on 3000GB disk means 375 files, with 128 KiB wastage average per each file (half the default blocksize in snapraid) that is 48 MB total wastage (pardon the inconsistent use of binary and decimal prefixes, anyway we do only rough estimations here). That is nothing to be concerned with, probably you can't even fill the disk up to the last 48MB with 8GB average size files...

    Over 200k files is a bit worse, let's say 250 000 files.
    That is like 32 GB wastage with the default BS 256 (maybe some more, if you have really small files, as the files between 0-256KiB would be grouped more in 0-128KiB than in 128-256KiB, anyway it doesn't matter, too much detail...).

    You need to reserve those GB somehow (or keep them free). You really don't have anything to put there and exclude from snapraid? Some virtual machine? Some backup image you wouldn't want to keep anyway but now that you have some space to burn, why not? Offline wikipedia (look for Kiwix, it is about 50GB, quite fast and nice, you'll be happy you have it in case your internet is down or wikipedia itself is down in protest as it happened a while ago - and of course there really isn't a need to parity protect a copy of wikipedia...).

     
  • John

    John - 2015-01-22

    It would solve the problem, roughly, maybe there would be still some small overhead left, hard to say for which would be bigger (for the filled partition or snapraid).

    However it would use both ntfs and snapraid quite far from the defaults, I don't know with what results. For sure you'll need 4x more RAM for snapraid.

     
  • Foebik

    Foebik - 2015-01-29

    Just curious, can you not just use a quota on the drive and deny write access after a certain amount is used?

    Limit Description

    Warning threshold
    You can configure the system to generate a system logfile entry when the disk space charged to the user exceeds this value.

    Hard quota
    You can configure the system to generate a system logfile entry when the disk space charged to the user exceeds this value. You can also configure the system to deny additional disk space to the user when the disk space charged to the user exceeds this value.

     

    Last edit: Foebik 2015-01-29
    • Leifi Plomeros

      Leifi Plomeros - 2015-01-29

      Not sure if it is relevant when using NTFS in Linux, but in Windows 7 it is only possible to set quota per user.

      In theory that sounds great... So you take each 3.63 TiB disk and set the quota to 3.62 for the single user... But then later you discover that some of the data has been attributed to System which resulted in full disk without the quota limited user exceeding his individual quota.

      Maybe there is a workaround, but generally speaking the quota feature doesn't seem to be intended to avoid filling disks, only to limit individual users, without any regard to available disk space.

       
  • Foebik

    Foebik - 2015-01-29

    Good points.

    But to be honest, what you guys are trying to accomplish is to make SnapRAID backup files that it is really not well suited for. If you have 100's of thousand of tiny files, your going to have waste. Either you are using too much of your parity drive and run out of space, or you are not allowing yourself to use the full potential of the data drive's space. And even if you can mitigate that waste, it's going to take a ton of RAM.

    Either way, unless you are backing up large files, I would look for a different solution.

    Or you can use a combination of solution. For instance, I use regular incremental backups for my System drive. And SnapRAID for all the rest of the data drives. And since these backup files are fairly large, I have it backing up to one of the data drives, which in turn gets parity.

    With this solution, my two most filled drives look like this and I still have system recovery backup:

    d1 96% used 0.1 GiB wasted (5248 files)
    d2 98% used 0.0 GiB wasted (4714 files)

     

    Last edit: Foebik 2015-01-29
    • John

      John - 2015-01-31

      Hundreds of thousands of files is absolutely nothing to write home about.
      My linux root is on a flash drive, 8GB with 25% or so used - and it has 237k files.
      If you use the default mkfs options you get almost 100 million inodes on a 1.5TB drive.
      In case of snapraid the "waste" isn't even waste, you can just use the space for ANYTHING else.

       
      • Foebik

        Foebik - 2015-02-01

        Agreed, 100s of thousand isn't much, I have over 1 million files in 160GB of space. But my point is I would never use SnapRAID to back it up for 2 reasons.

        1) As mentioned before, SnapRaid isn't as efficient with lots of small files.
        2) These files change and are added/removed often.

        I've might have missed it somewhere in this thread and I apologize if I did, but is this an OS/System drive that the OP was talking about backing up? I have been assuming it was but I could be wrong.

        He said this
        "I have only data I want to protect on my drives, so excluding files/folders is not an option for me."
        But the word "data" is ambiguous at best. I considered the "data" on my OS drive important as well.

        Again, I think we are back to the problem of being efficient with SnapRaid. Correct me if I am wrong (as my wife will tell you I often am) but you have 2 options.

        1) Reserve space on the disk that cannot be used for any other purpose (or at least is not parity)
        2) Play around with block size and see if you can 'make' SnapRAID more efficient at the cost of needing more RAM

        If you can get away with the 2nd option I think you would be golden. Otherwise I say that we are trying squeeze a square peg in a round hole.

         

Log in to post a comment.

MongoDB Logo MongoDB