Hi,
currently I'm running a RAID5 at home and am thinking about a switch to snapraid. I like the concept but are unsure about the sync performance. It's clear to me that the initial sync has to examine every file on the system for building the parity file. But what I wasn't able to figure out is if the subsequent syncs still have to read every file or at least have to examine every directory that's included in the snapshot. Or does snapraid use the filesystem's journal or monitors the write operations. I'd expect this to be significantly faster than doing a full directory scan.
As i'm not deep into linux file systems please forgive me if I used the wrong expressions. But I hope you get my point.
May be someone has an answer for me how this is handled in snapraid and maybe you can post the time a sync takes on your configuration along with the combined size of the files and may be event the count of files.
Thanks very much
Ludwig
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2012-11-26
First it helps to understand what SNAPSHOT raid is, and how it differs from REALTIME raid like your RAID5. Snapshot RAID does not monitor filesystem changes, it only updates its parity files when a sync operation is started.
Only the first sync is time consuming. I have a test system with 16 x 3TB disks and 2 x 3TB parity disks. All 16 data disks are full - tens of thousands of files and folders, and it takes about 6 hours for the initial sync. After the initial sync, subsequent syncs are much faster because it only needs to update files and folders that have changed, been added or deleted. So no, followup syncs after the baseline sync do not read the entire file - it only verifies the last modified time/date and number of bytes and if its the same, assumes file has not changed.
Whether or not snapshot raid vs realtime raid is right for you depends on how often you're changing files. Files like media files (movies, music, photos) which do not change very often benefit from snapshot raid. I myself have slowly been migrating away from realtime hardware RAID for media storage, for the simple reason that striping data across multiple disks introduces unnecessary risk if greater than single disk read/write performance isn't required - which for media files its not. Hardware (striping) RAID is only a performance and uptime multiplier, thats it.
The beauty of snapshot raid versus hardware raid is, In your case with RAID5 once you lose 2 disks you've lost everything if you dont have any backup. However with snapshot raid, if you lose those same two disks, you only lose two disks - all other disks retain their data. Ofcourse hardware RAID is nice because it does pooling and parity without you having to touch anything. Conversely the downside to snapshot raid is you have to babysit it by manually syncing whenever your files change, or use task scheduling to automate. Things to consider.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
currently I'm running a RAID5 at home and am thinking about a switch to snapraid. I like the concept but are unsure about the sync performance. It's clear to me that the initial sync has to examine every file on the system for building the parity file. But what I wasn't able to figure out is if the subsequent syncs still have to read every file or at least have to examine every directory that's included in the snapshot. Or does snapraid use the filesystem's journal or monitors the write operations. I'd expect this to be significantly faster than doing a full directory scan.
As i'm not deep into linux file systems please forgive me if I used the wrong expressions. But I hope you get my point.
May be someone has an answer for me how this is handled in snapraid and maybe you can post the time a sync takes on your configuration along with the combined size of the files and may be event the count of files.
Thanks very much
Ludwig
First it helps to understand what SNAPSHOT raid is, and how it differs from REALTIME raid like your RAID5. Snapshot RAID does not monitor filesystem changes, it only updates its parity files when a sync operation is started.
Only the first sync is time consuming. I have a test system with 16 x 3TB disks and 2 x 3TB parity disks. All 16 data disks are full - tens of thousands of files and folders, and it takes about 6 hours for the initial sync. After the initial sync, subsequent syncs are much faster because it only needs to update files and folders that have changed, been added or deleted. So no, followup syncs after the baseline sync do not read the entire file - it only verifies the last modified time/date and number of bytes and if its the same, assumes file has not changed.
Whether or not snapshot raid vs realtime raid is right for you depends on how often you're changing files. Files like media files (movies, music, photos) which do not change very often benefit from snapshot raid. I myself have slowly been migrating away from realtime hardware RAID for media storage, for the simple reason that striping data across multiple disks introduces unnecessary risk if greater than single disk read/write performance isn't required - which for media files its not. Hardware (striping) RAID is only a performance and uptime multiplier, thats it.
The beauty of snapshot raid versus hardware raid is, In your case with RAID5 once you lose 2 disks you've lost everything if you dont have any backup. However with snapshot raid, if you lose those same two disks, you only lose two disks - all other disks retain their data. Ofcourse hardware RAID is nice because it does pooling and parity without you having to touch anything. Conversely the downside to snapshot raid is you have to babysit it by manually syncing whenever your files change, or use task scheduling to automate. Things to consider.