SnapRAID / Discussion / Help: SnapRAID + mergerfs concept validation / fact check etc

kyle-- - 2017-03-12

Hello there,

First and formost, thanks for what appears to be a fantastic software, I'm excited to implement it! I've been watching the project for some time!

I'd really appricate if you can check a proof of concept I'm working on for my long term home backup and media solution. My current NAS has lastest 10+ years, now its time for the next-gen.

I have a 24 bay storage server etc

Please check out the diagram/info I have put together and cross-check, fact-check, improve ideas, corrections etc.

The diagram was made with yed, and the source for the diagram is here: http://pastebin.ca/3779292

Cheers

Kyle

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

kyle-- - 2017-03-25

*bump*

It would be great to get feedback on my initial post #1 especally on the SnapRAID specifics. I have included the text here in rich form at the end this post.

Here are some experiences I've had with my prototyping/research in the meantime:

I picked up a single Seagate ST5000LM000 5TB 2.5" (fat 1.5cm height) to check it fit my disk bays. It did.

I installed ProxMox VE and tried a few container setups and ended up going with a full "stable" Debian VM from network install iso.

I installed open media vault on the VM: http://www.openmediavault.org/

I was fairly impressed with OVM and then I installed the OVM Extras plugin which provides sources for lots more plugins. I enabled

openmediavault-luksencryption 3.0.2

openmediavault-snapraid 3.6.6

openmediavault-unionfilesystems 3.1.14

I plugged in the 5TB drive, and another existing drive I had, and assigned them to the VM (howto)

I then enabled LUKS on the new drive via OMV GUI I then added an XFS file system on-inside the LUKS.

I then mounted the file systems and then used the union file-system plugin to setup mergerfs.

Everything worked like expected and didn't have major issues after a reboot either (just need to unlock LUKS disks via the OMV GUI). I didn't need to touch the command line for anything yet.

I tested SMB and RSYNC daemons with success as expected.

The performance of the 5TB drive for read and writes was enough to just about reach Gigabit LAN speeds and for a home storage solution that is enough in my eyes... for a good while at least.

I was able to successfully test 800 GB transfers multiple times, with 90-110 Megabyte/s xfer rates. via SMB and RSYNC daemons.

Soon I will order more drives and expand the storage space and start to use SnapRAID. Probably start with 12 disks.

I look forward to getting hands on with SnapRAID!

Here is the rich form text from the image in post #1.

Use case

Media server with lots of data disks where existing files change rarely.

Backup server with lots of data disks where existing files change occasionally.

Full backups not possible.

Real-time and high-availability not required.

Physical

24 bay storage server (+2 internal), up-to dual Xeon, 64GB RAM upto 1024GB.
non-expander backplane connected to LSI HBA's.

Set-up

A ProxMox VE installed on the OS drives providing the hyper-visor.

A Debian/Ubuntu VM to provide NAS style functionality installed on the OS drives.

The NAS VM is provisioned with up-to 24 drives.

The NAS VM uses each drive as a normal block device formatted with XFS file-system or perhaps Btrfs.

4 drives are reserved for SnapRAID parity (for 20 data disks).

up-to 20 drives are for data and can be read/written as needed for the given chassis.

mergerfs (or similar) can present the disks as a pooled/unioned file-system and can be exported.

other VM's/containers can be set-up on the OS drives partitions for misc duties/functionalities.

Backups and data integrity

The NAS VM is regularly backed up so it can be restored or migrated.

Super critical real-time data can be backed up via cloud sync or rsync or such.

SnapRAID runs on a regular basis on the data drive files systems (not via the union), which provides:

can sustain failure of up-to 4 drives from the 20 concurrently without data loss.

bit rot detection via checksumming and regular scrubbing

can heal bit rot / corruption

can recover files and/or paths on any of the drives. (un-delete)

simple to add or remove drives, including parity drives

simple to replace drives, including parity drives

simple to recover data or rebuild parity after disk failure

Drive monitoring

S.M.A.R.T monitoring and alerting of key predictive failure attributes.

SnapRAID scrubbing.

Disaster recovery

can sustain failure of up-to 4 drives from the 20 concurrently without data loss (since the last SnapRAID sync).

each failed drive >4 is lost but only that drive not all drives.

if the chassis/mainboard/etc is damaged the VM and data disks can be migrated to a donor system

each data disk can be read standalone by any system that supports the given disks file-system

parity disk failure is recoverable.

Encryption

At the block level dm-crypt + LUKS is supported on data and parity drives.

At the file-system level gpg protected files or encrypted containers or some analogue.

Expansion

For the mentioned hardware, expansion is no problem up to the 20 data disk limit, >20 data disks you need to build another node/pod. The nodes would be standalone.

I read from the author of SnapRAID that because SnapRAID works on file paths, it would be possible to make a backup of multiple network file systems however its not the designed use case. One has to ask if that is really a good implementation of SnapRAID.

Parity / Redundancy costs

data size * 1.2 aka you can use ~80% of your usable storage capacity.

Pros

Fairly simple set-up.

Flexibility to grow from n data disks up-to 20.

low parity cost. circa 93 TB usable space using 24 5TB drives.

Fault tolerant.

no data lock-in or large array weakness.

no hard dependencies, any data drive can be read/written individually independent of the solution.

can protect against bit-rot and corruption.

diverse data restoration and healing options.

Cons

Not a full backup.

Not a real-time backup.

Not highly available.

No incremental backups or snapshot histories (with restorable versions of a given file/state over time).

When file(s) have been deleted followed by a SnapRAID sync, those deleted files are lost (if not backed up elsewhere).

No parallelisation of reads or writes.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

UhClem - 2017-03-25

Physical
... , 64GB RAM ...

Kyle, did you mean to write ECC RAM?

I'll have more to add, after you've clarified.

--UhClem

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

kyle-- - 2017-03-26

Yes, the RAM is ECC.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

UhClem - 2017-03-26

Yes, the RAM is ECC.

Good to hear that.

It looks like you have a very good understanding of SnapRAID. One thing that you want to get a fuller appreciation of is the impact/ramification of file deletion. It has no bearing on anything you've written, or are planning, but you want to be aware of it as you start to formulate your own "Operating Procedures" for (hopefully) smooth sailing, and gotcha-avoidance. This isn't the place to get into it, and some others can probably explain it better than I, and already have. (So plug the right terms in front of "...snapraid site:sourceforge.net".)

Here are a few things that (I think) will really benefit you, in setting up your SnapRAID array. You are now at a stage you may never be able (or, find it feasible) to re-visit.

0a. Your SR data filesystems should contain only actual SR-appropriate files. Your .conf file should only have "exclude" lines for /lost+found/ .
0b. Your SR parity filesystems should contain only the parity file.
0c. Clutter and fragmentation are not your friend. Always make plans, and take steps, to avoid them.
Others might quibble with one or more of those, but .... onward ...
1. Give serious consideration to creating a small SR "Lab array" for testing, experimentation, performance tuning, etc. E.g., each of my /dev/sdX1 (for the full # [D+P] of drives) is 8GiB. You will never miss that amount of space from your Production Array; and you will be thankful to have a "safe place to play".
2. On all except your SR parity disk drives, create an equal-sized Content/Misc filesystem. I don't know what size is appropriate for your array, but I think that 2.5-3x your expected maximum (expected) ".content" file size. Others with large arrays, and experience with recent SR versions, can assist. P+2 of these will be dedicated solely to your .content files [ref:0c above]. The other D-P-2 will find good uses as time goes on--an idea to ponder, if SR doesn't already provide this, is to save a few copies of the previous file (as ArrayX.content.bak). There will almost certainly be files that are more SR-related (than VM/OS-related) but are not SRarray-worthy that can go in those D-P-2.. The P drives won't have this filesystem, and will thus have extra capacity (to give you peace of mind) re: parity wastage. All D of these filesystems will be created at the [b]end[/b]of each D drive. While this will result in syncs taking a (very) little bit longer [content's being written to the slowest region]. It will make it easier/cleaner if you should ever want/need to expand either the Content/Misc or the DATA filesystems (just-a-little) down the road.
2a. Optionally your Primary .content file can go on a (non-array) SSD (if you have one planned already), to speed up checks/scrubs a little. [This is not a now-or-never decision though.]
3. Note that all DATA filesystems will be exactly the same size (given that all your drives are the same size). Obviously this is not required (by SR). But it can be very handy, either during some flavor of disaster recovery, or some semi-major re-org, to just do an "image copy" via ("dd bs=1M ...") rather than a noticeably slower "cp -a -r ..." (or whatever).
4. Do thorough research on the mkfs command for your chosen Array [D+P] filesystems so that you minimize wasted capacity.

Separately, you didn't ask, so forgive me for sticking my nose in, but ... It sounds like you are pretty much decided on that ST5000LM000 for your array drives. (I have zero experience with them, but I do have a solid understanding of current-era drives.) I did some quick research (and I was curious), and I believe they are a little too "bleeding-edge". I think that all 3 of their densities are too high for good reliability (recording density, track density, and, ergo, areal density)--even accounting for them being 5400 rpm. The ST5000LM000, and the ST[34]000LM024, have markedly different densities than the entire line of preceding ST[34]000LM0xx drives. That "000" (after the ST5000LM) means you're the guinea pig.

It's all just food for thought. (Don't get indigestion.)

--UhClem

Last edit: UhClem 2017-03-26

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

kyle-- - 2017-03-27

Thx for the great feedback, let me digest and come back to you with some thoughts/questions/clarifications.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jay Heyl - 2017-07-31

Pardon the necro but I have some questions along the same lines. First, I've Googled and can't find anything about what to expect for the content file size. Obviously this is going to depend on the situation, but even a ballpark mentioning file size and array size or number of files would be helpful. Are we talking megabytes, gigabytes, tens of gigabytes, or what?

Point 2 of the advice here suggests using a separate filesystem for the content files. In the Linux world, does it make sense to use LVM to manage the volumes, possibly leaving some space initially unassigned to accommodate changes down the road? Based on what was said above I'm thinking a small volume on each drive for testing purposes, another volume of TBD size on all data drives for content files, and most of the rest of the drive for a data volume, leaving maybe 10% of the drive unassigned initially. I figure this way if the content file volume is too big or too small I can adjust later. Does this make sense? Or am I overthinking it and adding complexity that isn't needed?

In case it matters, I'm building a new storage server. I have about a dozen drives currently scattered among other computers. They're all 2TB and 4TB. I have two new 8TB drives, one for data, one for parity, plus a 1TB boot drive in the new server. My intent is to install Debian, SnapRAID, mergerFS, and whatever other odds and ends I might require. I'll then start moving data from the existing drives, one by one. As I get a drive copied over I'll add that drive to the array. (All the drives are currently formatted NTFS, so they'll be reformatted after the data is copied off.) I know I will probably need to add another parity drive before I'm done but I'm going to wait until I get further into the process before worrying too much about that. The server case is big enough to handle all these drives and a couple more.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Leifi Plomeros - 2017-07-31
  
  First question: ~0.5 GB content file per 10 TB data.
  http://www.snapraid.it/faq#fcontent
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- rubylaser - 2017-07-31
  
  I just create one large partition on my data disks. This disks will also contain the content files. I see no reason to use LVM in a SnapRAID + mergerfs scenario. I think it just adds complexity. I have directions on my site to setup SnapRAID and mergerFS if you are interested.
  
  https://zackreed.me/setting-up-snapraid-on-ubuntu/
  https://zackreed.me/mergerfs-another-good-option-to-pool-your-snapraid-disks/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jay Heyl - 2017-07-31
  
  @Leifi - Thanks. I overlooked that in the FAQ.
  
  @rubylaser - I've read your posts on SnapRAID and mergerFS. In fact, they were instrumental in convincing me that was the right way to go.
  
  I definitely understand wanting to keep things as simple as possible. There will be plenty to keep straight in a plain vanilla 14-drive SnapRAID/mergerFS setup. The one place where I see an advantage in putting the content file in a separate partition/volume is it allows you to ensure there will be space available for it. If you mingle data and content and data expands to fill the partition, you'll have to manually shuffle things around to make room for the content file. I suppose the mergerFS 'minfreespace' could be used to the same effect.
  
  Thanks for the replies.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - rubylaser - 2017-08-03
    
    Well, I'm glad those articles were helpful :) Although, I understand your concern for having space for the content file(s), as you mentioned, the minfreespace option is your savior with minimal effort. Just set it to 20GB+ and you will always have space for your ~2GB content files.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - kyle-- - 2017-10-18
      
      Your writing was also very useful as I was researching what I posted earlier. Thank you for that.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

SnapRAID + mergerfs concept validation / fact check etc

A backup program for disk arrays

Forums

Help

SnapRAID + mergerfs concept validation / fact check etc

SnapRAID + mergerfs concept validation / fact check etc

A backup program for disk arrays

Forums

Help

SnapRAID + mergerfs concept validation / fact check etc document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

SnapRAID + mergerfs concept validation / fact check etc