From: Mike H. <mh...@gl...> - 2009-10-13 10:48:44
|
On Tue, Oct 13, 2009 at 05:28:24AM -0500, Manuel Amador (Rudd-O) wrote: > El Martes 13 Octubre 2009, Mike Hommey escribió: > > On Mon, Oct 12, 2009 at 05:30:40AM -0700, adfas asd wrote: > > > I actually used BTRFS a couple months ago and depended on it for a month > > > or so, when it destroyed all my data in a power fail. > > > > There is absolutely no filesystem that will prevent random data > > destruction in a power fail. The best thing you can do to prevent them > > from happening is to have an array controller with battery, that will > > keep caches in such cases. Failing that, there is nothing else than > > backups. > > Sorry for being adamant, but that is NONSENSE. ZFS has been explicitly > engineered to survive power failures, and has withstood over one million > forced crashes. > > On ZFS, even if you lose caches due to power outages and the whole transaction > write does not make it or makes it inconsistently across disk vdevs, the use > of barriers on modern SATA drives will ensure that the transaction will be > committed completely, or won't be committed at all. The data loss will > absolutely NOT BE RANDOM, but will be rather limited to the very last > transaction if at all. And even if one of your devices in your file system > gets completely TRASHED, ZFS can scrub and repair data on the failed device by > using checksummed and verified data from the other device, AND letting you > know which files went inaccessible. I know all of this from experience. > > Really, this mindset of "battery backed RAID arrays"... it's so nineties that > Vanilla Ice would like it back. Don't spread FUD. You're not contradicting my thoughts, we're just not on the same tracks. What I meant to say is that while filesystems can guarantee levels of consistency, they won't *ever* guarantee that *everything* you write has been written and won't ever be lost after a power failure. And in some cases, "the very last transation" can be very important data, too. And that will be lost, unless the hardware prevents that by some means. And depending on when the power loss happens, and how the data has been put physically to disk or not, you don't know in advance what is going to have been destroyed. And there are various ways to lose write barriers (like using lvm, as has been reminded in this thread). That said, some filesystems help prevent some disasters, but they can't do miracles either. Mike |
From: Manuel A. (Rudd-O) <ru...@ru...> - 2009-10-13 12:40:59
|
> You're not contradicting my thoughts, we're just not on the same tracks. > What I meant to say is that while filesystems can guarantee levels of > consistency, they won't *ever* guarantee that *everything* you write has > been written and won't ever be lost after a power failure. We aren't talking about the fantasy world here. We are talking about atomicity (a change goes in, or it doesn't), durability (once changes hit the platters, you're good) and consistency (when you boot, you won't have to check the on-disk structures). Those are the three properties that advanced filesystems get you. Advanced filesystems do not get you magic pony land propositions like "save all the data even the data that was on the MEMORY BUFFERS of the disks". If you meant to say that or you interpreted my words as saying that, then DUH, we already knew that is impossible. What good filesystems and cheap decent disks get you is a controlled, consistent, properly ordered transactional file system that doesn't need O(n) checking on boot, even in cases of power loss. > And in some > cases, "the very last transation" can be very important data, too. And in those cases you can procure battery-backed controlled shutdown / expensive machinery / level 2 caches in solid state drives / synchronous writes to disk / an astrologist to tell you what your boss will do after the data loss of the few kilobytes of writes your system lost. But frankly, if what got stuck in the buffers concerns you, then what got stuck in the pipeline of your machine *before hitting disk buffers* ought to be MORE of a concern, in which case you *already got a decent uninterruptible power supply* with the money you saved from not making the idiotic purchase of a battery- backed disk or array of disks. Reliability, speed, price. Today, you can pick any three of the three, but don't expect miracles. > And > that will be lost, unless the hardware prevents that by some means. And > depending on when the power loss happens, and how the data has been put > physically to disk or not, you don't know in advance what is going to > have been destroyed. False. You DEFINITELY know that ONLY the last transaction or set of contiguous transactions got destroyed, AT MOST. > And there are various ways to lose write barriers > (like using lvm, as has been reminded in this thread). We already discussed this LVM and barriers thing. Administrative incompetence has no place in a discussion about filesystems and their merits. > > That said, some filesystems help prevent some disasters, but they can't > do miracles either. > > Mike > |
From: Mark P. <Mark.Phalan@Sun.COM> - 2009-10-13 11:18:19
|
On 10/13/09 01:04 PM, John Haxby wrote: > On 13/10/09 11:32, Manuel Amador (Rudd-O) wrote: ... > > You probably should. Btrfs is very new; the reason is made it into the > mainline kernel so early on is that it holds considerable promise and > the more exposure it gets from people willing to test it and fix bugs > the quicker it will become stable enough for the enterprise. > > There is one insuperable problem with zfs, and the reason that zfs-fuse > exists in the first place: CDDL is incompatible with the GPL. Or could be argued that the GPL is incompatible with the CDDL :) Either way it will be interesting to see what Oracle will (or won't do) once the acquisition finally goes through. > It's a > shame, but without that limitation this entire thread probably wouldn't > exist. In some ways I'm glad that limitation does exist though: it > seems to me that btrfs has learned from all its predecessors, zfs and > hardware technology changes included, and will wind up being much, much > better as a result. Yes, being able to shrink a volume is _that_ > important. I know this is somewhat off-topic but I've seen claims that btrfs is somehow better than zfs or at least will end up being so. Any pointers on that? BTW: A couple of ZFS features you'll likely see pretty soon - ZFS volume shrinking, de-duplication, built-in encryption. (more here: http://blogs.sun.com/video/entry/kernel_conference_australia_2009_jeff) -M PS As should be clear from my email address I'm a Sun employee but have nothing to do with ZFS development and know nothing about Oracle's plans for Solaris or ZFS. |
From: Paul S. <pa...@up...> - 2009-10-13 11:26:31
|
Manuel, I just want to make sure. Are you talking about ZFS in Solaris or zfs-fuse ? Do zfs-fuse implement barriers ? Is it possible to implement barriers on a fuse filesystem ? If it is, how would one do that ? Regards Paul Manuel Amador (Rudd-O) wrote: >> My point was: the are no such guarantees in all cases, especially on >> low-end hardware, which I guess is what "adfas asd" was using when he >> lost data to btrfs. My point is that his loss in power fail would have >> most probably happened with any other fs, except if he has the right >> hardware. >> > > This is JUST NOT TRUE. Every modern SATA disk has write barriers that prevent > write reordering (this is the *cornerstone* of transactional commit to disk), > and ZFS does indeed make use of those write barriers. The "it would have most > probably happened with any other FS", I can understand from people running > ext2 or FAT32 filesystems -- not from those running ZFS. > > Go to our list and ask who has experienced catastrophic data loss with ZFS. > We have had our share of bugs, impediments, faults, but none that would cause > data loss so far. > > >> Mike >> >> --------------------------------------------------------------------------- >> --- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is >> the only developer event you need to attend this year. Jumpstart your >> developing skills, take BlackBerry mobile applications to market and stay >> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >> http://p.sf.net/sfu/devconference >> _______________________________________________ >> fuse-devel mailing list >> fus...@li... >> https://lists.sourceforge.net/lists/listinfo/fuse-devel >> >> > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > ------------------------------------------------------------------------ > > _______________________________________________ > fuse-devel mailing list > fus...@li... > https://lists.sourceforge.net/lists/listinfo/fuse-devel > |
From: Manuel A. (Rudd-O) <ru...@ru...> - 2009-10-13 11:38:20
|
El Martes 13 Octubre 2009, Paul Schutte escribió: > Manuel, > > I just want to make sure. Are you talking about ZFS in Solaris or zfs-fuse > ? > > Do zfs-fuse implement barriers ? > Is it possible to implement barriers on a fuse filesystem ? > If it is, how would one do that ? ioctl to the block device. it is implemented in ZFS-FUSE. Check the code. > > Regards > Paul > > Manuel Amador (Rudd-O) wrote: > >> My point was: the are no such guarantees in all cases, especially on > >> low-end hardware, which I guess is what "adfas asd" was using when he > >> lost data to btrfs. My point is that his loss in power fail would have > >> most probably happened with any other fs, except if he has the right > >> hardware. > > > > This is JUST NOT TRUE. Every modern SATA disk has write barriers that > > prevent write reordering (this is the *cornerstone* of transactional > > commit to disk), and ZFS does indeed make use of those write barriers. > > The "it would have most probably happened with any other FS", I can > > understand from people running ext2 or FAT32 filesystems -- not from > > those running ZFS. > > > > Go to our list and ask who has experienced catastrophic data loss with > > ZFS. We have had our share of bugs, impediments, faults, but none that > > would cause data loss so far. > > > >> Mike > >> > >> ------------------------------------------------------------------------ > >>--- --- Come build with us! The BlackBerry(R) Developer Conference in SF, > >> CA is the only developer event you need to attend this year. Jumpstart > >> your developing skills, take BlackBerry mobile applications to market > >> and stay ahead of the curve. Join us from November 9 - 12, 2009. > >> Register now! http://p.sf.net/sfu/devconference > >> _______________________________________________ > >> fuse-devel mailing list > >> fus...@li... > >> https://lists.sourceforge.net/lists/listinfo/fuse-devel > > > > ------------------------------------------------------------------------ > > > > ------------------------------------------------------------------------- > >----- Come build with us! The BlackBerry(R) Developer Conference in SF, CA > > is the only developer event you need to attend this year. Jumpstart your > > developing skills, take BlackBerry mobile applications to market and stay > > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > > http://p.sf.net/sfu/devconference > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > fuse-devel mailing list > > fus...@li... > > https://lists.sourceforge.net/lists/listinfo/fuse-devel > |
From: Goswin v. B. <gos...@we...> - 2009-10-14 14:27:56
|
"Manuel Amador (Rudd-O)" <ru...@ru...> writes: > What you want is ZFS, possibly on Solaris but doable on Linux. You will > create one pool in the HTPC with your four disks inside that case there. Then > you will export the four disks in your garage using NBD or iSCSI, and import > these block devices in your HTPC. Then you will add each remote disk as a > mirror to the corresponding local disk in your pool. > > this config > > locald1 - remoted1 MIRROR > locald2 - remoted2 MIRROR > locald3 - remoted3 MIRROR > locald4 - remoted4 MIRROR > > This is a zpool composed of four RAID1 disks, one in your house, one outside > it. You have redundancy, checksumming (VERY IMPORTANT), the ability to check > the entire pool with a scrub (which is proportional to the size of the data > you have) AND you can do backups by snapshotting. Does ZFS have any support for this kind of setup? What I mean is that the setup should have: - write only flag for the remote disks - delayed write to the remote disks - support to flag delayed regions for resync after crash - cope with an out of date mirror being made primary Theoretically zfs could be made really good at this. You could have a "in-sync" root that is mirrored on both disks and a "syncing" root that is commited to the local disks but still syncing to the remote disks. A fsync() call could block only for the "syncing" root so speed would be the high local disk speed. The "in-sync" root on the other hand would only advance slowly (and skip many "syncing" roots) due to the slow GBit speed. But you would need special support for this in ZFS. I don't think ZFS has support for this. Nothing I have read hinted at it. > :-) > > See you on the ZFS-FUSE list! > > El Jueves 08 Octubre 2009, adfas asd escribió: >> I have an HTPC, which records terabytes to my RAID10 array. The data is >> too large to back up any more, so I've decided that I want to move one side >> of that mirror into the garage, in case of theft or fire. I plan to build >> a little headless box which will have a small mobo, four hot-swap disk >> drives, and GbEthernet, as my remote storage in the garage. >> >> I'll need to communicate with it quickly and effectively, and so far it >> looks like iSCSI is the ticket, but FUSE is so amazing that I thought I'd >> check and see whether there are any solutions here. My main focus is to >> make this a RAID10 array (mirrored & striped), with one side of the mirror >> in the HTPC and the other in the garage attached by GbEthernet. >> >> The only FUSE modules that look close are Gluster, Starfish, and Moose. >> Any suggestions? >> >> Also I am looking for a *high*quality* enclosure and cage, similar to this >> preassembled (and expensive) one: >> http://qnap.com/pro_detail_feature.asp?p_id=110 >> Any suggestions? I found that finding the right box to put your system in is the hardest part. You should also consider that in the garage there might be a lot of dirt. I would suggest a enclosure that has a dust filter. The other problem I forsee is temperatures. Sure it stays warm enough during winter season? MfG Goswin |
From: Manuel A. (Rudd-O) <ru...@ru...> - 2009-10-15 15:12:04
|
> Does ZFS have any support for this kind of setup? In Solaris, it does. > What I mean is that the setup should have: > > - write only flag for the remote disks > - delayed write to the remote disks Nono. Writes are in sync. If you want *delayed* writes, you can use the AVS or the zreplicate utilities to replicate writes to the remote devices. > - support to flag delayed regions for resync after crash > - cope with an out of date mirror being made primary > > Theoretically zfs could be made really good at this. You could have a > "in-sync" root that is mirrored on both disks and a "syncing" root > that is commited to the local disks but still syncing to the remote > disks. A fsync() call could block only for the "syncing" root so speed > would be the high local disk speed. The "in-sync" root on the other > hand would only advance slowly (and skip many "syncing" roots) due to > the slow GBit speed. What you'd want is likely a local set of volumes in a local pool, coupled with AVS. > But you would need special support for this in ZFS. I don't think ZFS > has support for this. Nothing I have read hinted at it. AVS. |
From: Goswin v. B. <gos...@we...> - 2009-10-14 14:54:02
|
adfas asd <chi...@ya...> writes: > I have an HTPC, which records terabytes to my RAID10 array. The data is too large to back up any more, so I've decided that I want to move one side of that mirror into the garage, in case of theft or fire. I plan to build a little headless box which will have a small mobo, four hot-swap disk drives, and GbEthernet, as my remote storage in the garage. > > I'll need to communicate with it quickly and effectively, and so far it looks like iSCSI is the ticket, but FUSE is so amazing that I thought I'd check and see whether there are any solutions here. My main focus is to make this a RAID10 array (mirrored & striped), with one side of the mirror in the HTPC and the other in the garage attached by GbEthernet. > > The only FUSE modules that look close are Gluster, Starfish, and Moose. Any suggestions? > > Also I am looking for a *high*quality* enclosure and cage, similar to this preassembled (and expensive) one: > http://qnap.com/pro_detail_feature.asp?p_id=110 > Any suggestions? I just thought to mention something completly different. Why not stop mirroring the live data and instead use the storage in the garage as backup? Is it essential that the system keeps working on a disk failure? What I would do is set up a 5 disks raid5 locally with lvm (or zfs if you like) and 9 disk raid5 in the garage. Then once a day snapshot the local LVM, rsync it to the garage and delete the snapshot again. Rsync supports creating a daily dir with hardlinks to the previous days file where unchanged (or hourly or weekly). You can then easily keep the tree for the last 3 days, last sunday and last sunday of a month or whatever pattern you like. Doing realy online mirroring will quickly reduce your speed to the gbit link and filesystem errors will affect both mirrors. MfG Goswin |
From: Manuel A. (Rudd-O) <ru...@ru...> - 2009-10-15 15:14:38
|
> Why not stop mirroring the live data and instead use the storage in > the garage as backup? Is it essential that the system keeps working on > a disk failure? See AVS. However, mirroring and backups are solutions for separate problems. Without a mirror, you will have to restore from backup -- and if the disk failure is catastrophic, you will lose a shitload of data that you will then have to restore from backup in a hours-long procedure. > What I would do is set up a 5 disks raid5 locally with lvm (or zfs if > you like) and 9 disk raid5 in the garage. Then once a day snapshot the > local LVM, rsync it to the garage and delete the snapshot again. Rsync > supports creating a daily dir with hardlinks to the previous days file > where unchanged (or hourly or weekly). You can then easily keep the > tree for the last 3 days, last sunday and last sunday of a month or > whatever pattern you like. Ah, raidz. Reasonable, but you lose performance. I do the rsync thing from a mirrored zpool to a non-mirrored copies=2 external zpool. I would use AVS, but I use Linux. If it was a remote machine, I'd zfs send | zfs recv instead -- much faster and identical data copies. > Doing realy online mirroring will quickly reduce your speed to the > gbit link and filesystem errors will affect both mirrors. > > MfG > Goswin > > --------------------------------------------------------------------------- > --- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is > the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > fuse-devel mailing list > fus...@li... > https://lists.sourceforge.net/lists/listinfo/fuse-devel > |
From: Goswin v. B. <gos...@we...> - 2009-10-19 08:28:51
|
"Manuel Amador (Rudd-O)" <ru...@ru...> writes: >> Why not stop mirroring the live data and instead use the storage in >> the garage as backup? Is it essential that the system keeps working on >> a disk failure? > > See AVS. > > However, mirroring and backups are solutions for separate problems. Without a > mirror, you will have to restore from backup -- and if the disk failure is > catastrophic, you will lose a shitload of data that you will then have to > restore from backup in a hours-long procedure. Not really. I assume you can detect and remove all damaged files first. Then you can export (read-only) the filesystem from the garage and use unionfs-fuse to overlay that with the local filesystem (whatever remains of it) with copy-on-write. You instantly are back to a working setup. Then, to restore the local filesystem, open every file for writing and close it imidiatly again. That will trigger the copy-on-write making sure the local filesystem has a copy of every file. Any changes made to files during that time will also go to the local filesystem and won't get lost. Call it a resync. The drawback of this method is that you have offline time to fix the local FS as best as possible, to start the unionfs-fuse and later to shut it down again (if desired). The advantage is that you get a far better speed locally and have backups. With mirroring a FS failure or accidental corruption of files will destroy both copies leaving you with nothing. >> What I would do is set up a 5 disks raid5 locally with lvm (or zfs if >> you like) and 9 disk raid5 in the garage. Then once a day snapshot the >> local LVM, rsync it to the garage and delete the snapshot again. Rsync >> supports creating a daily dir with hardlinks to the previous days file >> where unchanged (or hourly or weekly). You can then easily keep the >> tree for the last 3 days, last sunday and last sunday of a month or >> whatever pattern you like. > > Ah, raidz. Reasonable, but you lose performance. I do the rsync thing from a > mirrored zpool to a non-mirrored copies=2 external zpool. I would use AVS, > but I use Linux. If it was a remote machine, I'd zfs send | zfs recv instead > -- much faster and identical data copies. If zfs can do that eficiently then yes. I'm not familiar with all the zfs features. I stoped reading when I saw that zfs can not shrink. MfG Goswin |
From: Rudd-O <ru...@ru...> - 2009-10-20 02:08:12
|
> Not really. I assume you can detect and remove all damaged files > first. Then you can export (read-only) the filesystem from the garage > and use unionfs-fuse to overlay that with the local filesystem > (whatever remains of it) with copy-on-write. You instantly are back to > a working setup. Unless your filesystem of choice has checksumming and the loss in your storage node is not catastrophic, you can do fine without backups. Otherwise... you assume too much ;-) > > Then, to restore the local filesystem, open every file for writing and > close it imidiatly again. That will trigger the copy-on-write making > sure the local filesystem has a copy of every file. Any changes made > to files during that time will also go to the local filesystem and > won't get lost. Call it a resync. > > The drawback of this method is that you have offline time to fix the > local FS as best as possible, to start the unionfs-fuse and later to > shut it down again (if desired). The advantage is that you get a far > better speed locally and have backups. With mirroring a FS failure or > accidental corruption of files will destroy both copies leaving you > with nothing. > > >> What I would do is set up a 5 disks raid5 locally with lvm (or zfs if > >> you like) and 9 disk raid5 in the garage. Then once a day snapshot the > >> local LVM, rsync it to the garage and delete the snapshot again. Rsync > >> supports creating a daily dir with hardlinks to the previous days file > >> where unchanged (or hourly or weekly). You can then easily keep the > >> tree for the last 3 days, last sunday and last sunday of a month or > >> whatever pattern you like. > > > > Ah, raidz. Reasonable, but you lose performance. I do the rsync thing from a > > mirrored zpool to a non-mirrored copies=2 external zpool. I would use AVS, > > but I use Linux. If it was a remote machine, I'd zfs send | zfs recv instead > > -- much faster and identical data copies. > > If zfs can do that eficiently then yes. I'm not familiar with all the > zfs features. I stoped reading when I saw that zfs can not shrink. > > MfG > Goswin |