From: Libor V. <li...@co...> - 2004-10-26 22:10:25
|
Hi, I'm reading the discussion about write caching and I'd like to post my VERY subjective summary: Main problem: - without write caching is perfance really bad - with write caching you can very easily corrupt your data Data order: - as no HDD today guarantees order of requests written to disk (really true?) we should't take care... My very own experiences: - I run XFS filesystem on >300 Linux servers on SW RAID5 - administrators are doing REALLY crazy things to system - XFS is VERY stable - in fact I haven't seen any filesystem corrupted because incorect shutdown - when I was using ext2/3, I've seen lot of them! - system MUST be using write caching (I can see very big difference (~50%) in write performance if I use disks with 2 or 8 MB cache) => filesystems MUST solve this write cache problem now, no matter if they have iSCSI or SATA or HW/SW RAID as "storage backend" Summary: - every modern system (OS + fs) must be able to deal with loss of data in write cache - but how??? Stupid question: - imagine this setup - SW RAID5 (SATA), XFS filesystem, file mounted as loopback device, iSCSI using this loopback device - in this setup surely ALREADY is some write caching (at XFS/SW RAID levels), isn't it? I'm using such a system on 5 "semi-production" servers with Windows servers as clients and I haven't seen any problems with data corruption... (but I haven't done any tests focused on that!) Thanks for any comments. -- Libor Vanek +-------------------------------------+ | Email: li...@co... | | ICQ: 124529939 | | WWW: http://www.discobolos.net | | Tel/fax: +420 541 22 5091, 6293 | | Mobil: +420 777 703 642 | +-------------------------------------+ |
From: Carl R. <car...@gm...> - 2004-10-26 22:33:44
|
Hi, > Main problem: > - without write caching is perfance really bad Only random writes are slow, sequential writes are not that problem. > - with write caching you can very easily corrupt your data Unless you cannot control the caching, yes. > Data order: > - as no HDD today guarantees order of requests written to disk (really > true?) we should't take care... Every harddisk with write caching reorders the data to adapt the physical circumstances. > My very own experiences: > - I run XFS filesystem on >300 Linux servers on SW RAID5 > - administrators are doing REALLY crazy things to system > - XFS is VERY stable - in fact I haven't seen any filesystem corrupted > because incorect shutdown - when I was using ext2/3, I've seen lot of > them! XFS is more mature than ext2/ext3. AFAIK XFS was designed on journaling and not like ext3 on a already existing filesystem. Probably the designers have taken the caching issues into design. > - system MUST be using write caching (I can see very big difference (~50%) > in write performance if I use disks with 2 or 8 MB cache) => filesystems > MUST solve this write cache problem now, no matter if they have iSCSI or > SATA or HW/SW RAID as "storage backend" True. And I know some commercial filesystems have solved this. > Summary: > - every modern system (OS + fs) must be able to deal with loss of data in > write cache > - but how??? The filesystem has to take care that all critical data is really written to disk and the OS has to provide the necessary interface. > Stupid question: > - imagine this setup - SW RAID5 (SATA), XFS filesystem, file mounted as > loopback device, iSCSI using this loopback device > - in this setup surely ALREADY is some write caching (at XFS/SW RAID > levels), isn't it? AFAIK, the writes through IET are going write through into the loopback file. I don't know if XFS provides any further caching. I used LVM instead of a loopback file. But I'll try this, too. bye Carl |
From: Libor V. <li...@co...> - 2004-10-26 22:40:14
|
>> Data order: >> - as no HDD today guarantees order of requests written to disk >> (really true?) we should't take care... > > Every harddisk with write caching reorders the data to adapt the > physical circumstances. > Are there any HDD without write caching? >> Summary: >> - every modern system (OS + fs) must be able to deal with loss of >> data in write cache >> - but how??? > > The filesystem has to take care that all critical data is really > written to disk and the OS has to provide the necessary interface. So - what is this interface and how does it "propagate" this "sync" (probably) action to iSCSI/SCSI/SATA/SW RAID block devices? -- Libor Vanek +-------------------------------------+ | Email: li...@co... | | ICQ: 124529939 | | WWW: http://www.discobolos.net | | Tel/fax: +420 541 22 5091, 6293 | | Mobil: +420 777 703 642 | +-------------------------------------+ |
From: Carl R. <car...@gm...> - 2004-10-26 22:53:13
|
Hi, >> Every harddisk with write caching reorders the data to adapt the physical >> circumstances. > Are there any HDD without write caching? You can always turn off write caching. But I don't think that there are disk without write cache out on the market. >> The filesystem has to take care that all critical data is really written >> to disk and the OS has to provide the necessary interface. > So - what is this interface and how does it "propagate" this "sync" > (probably) action to iSCSI/SCSI/SATA/SW RAID block devices? The device driver must provide the necessary functions, the filesystem must use them. As I said before, I don't know the linux IO system very well, but for sure there's a kind of unifed interfaces so that you don't have to cover each storage technology within the filesystem. I think there's something in the block device layer but I have to investigate first. bye Carl |
From: FUJITA T. <to...@ac...> - 2004-10-27 01:39:41
|
From: "Carl Rueder" <car...@gm...> Subject: Re: [Iscsitarget-devel] Write caching - subjective summary Date: Wed, 27 Oct 2004 00:33:36 +0200 > > My very own experiences: > > - I run XFS filesystem on >300 Linux servers on SW RAID5 > > - administrators are doing REALLY crazy things to system > > - XFS is VERY stable - in fact I haven't seen any filesystem corrupted > > because incorect shutdown - when I was using ext2/3, I've seen lot of > > them! > > XFS is more mature than ext2/ext3. AFAIK XFS was designed on journaling and > not like ext3 on a already existing filesystem. Probably the designers have > taken the caching issues into design. I quato Stephen Lord's mail in the XFS mailing list. He is a SGI XFS developer. http://oss.sgi.com/archives/linux-xfs/2002-11/msg00057.html -- On Tue, 2002-11-05 at 13:50, Christophe Zwecker wrote: > Hi, > > im running gentoo with EVMS/LVM on 2.4.19 and XFS on a loop-aes > crypted device 430GB. > > I have write cache disabled on the 3ware controller. I copied 300 gb > over network, I noticed that every 50mb or so, the controller > stalled, didnt accept more data while writing like crazy to disk. > > it took me 24 h to copy 300 gb over 100mbit network. After that I > tried to turn write cache on, huge more performance. hmm well. I rebooted > couple of times, suddenley I could mount ther XFS partition any more > (bad superblock). I disabled write cache again and thank god I could > fix the issue with xfs_repair. > > So, here I am , disbled write cache, ok. the performance as I stated > above is awfull, is this how its supposed to be ? disable write cache > and have terrible performance? Unfortunately, some hardware is just designed to rely on write caching. It does seem a little odd that you lost the superblock though. This is put there by mkfs, and is always present, so how powering down the device is corrupting it I do not know. That seems like an issue with the 3ware firmware to me. -- He meant that hardware designed to rely on write caching (write back) is broken. XFS was not able to handle it in 2002-11. And after a quick look at the code, I don't think that the current code can handle it. So please ask XFS developers to correct me. If you use hardware 'write back with reorder', it need a battery to write all disk cache in the presence of a system crash. Please ask experts if you believe that file systems in Linux can handle 'write back with reorder' instead of repeating that I don't know Linux I/O layer design. If you think that 'write back with reorder' is useful for IET, you need to prove that modern file systems can handle it. I showed that ext3 was not able to handle it in 2003/02 (at least) by quoting Stephen C. Tweedies' email. So please look for file system experts' opinions on this issue for yourself to correct me. |
From: FUJITA T. <fuj...@la...> - 2004-10-27 06:38:10
|
From: FUJITA Tomonori <to...@ac...> Subject: Re: [Iscsitarget-devel] Write caching - subjective summary Date: Wed, 27 Oct 2004 10:39:34 +0900 > > XFS is more mature than ext2/ext3. AFAIK XFS was designed on journaling and > > not like ext3 on a already existing filesystem. Probably the designers have > > taken the caching issues into design. > > I quato Stephen Lord's mail in the XFS mailing list. He is a SGI XFS developer. > > http://oss.sgi.com/archives/linux-xfs/2002-11/msg00057.html I found his more informative quotation. http://oss.sgi.com/archives/linux-xfs/2001-07/msg00246.html -- > I've asked about write caching before, and now I've got another > question. > > If I've got write caching turned on, and the power goes out, do > I face the possibility of silent metadata corruption? > > With ext2, the fsck would attempt to fix metadata problems on > bootup; on the other hand, with xfs, if write caching resulted > in a corrupted log, would the simple log replay that xfs does on > mount perhaps fail to catch possible metadata problems? Yes, and the same will be true for all write behind journalled filesystems, if the log does not make it to disk when the driver says it has then in theory a subsequent metadata write could make it out to the media. A crash at this point would mean that part of a transaction had made it to disk, but the rest which only existed in the disks internal cache would not. I would expect reiserfs, ext3 and jfs would all have the same issue. So I would not in general recommend write caching on a device, unless you know enough about it to be satisfied that its cache makes it out to disk on power down. Using the rotational power of the spindle to generate power to move the head to a special track and flush the cache is not unheard of for instance. --- If you are serious about your data, don't use 'write back with reorder' policy with a disk drive that does not have something like a battery to flush disk cache in the presence of a system crash. |
From: Carl R. <car...@gm...> - 2004-10-27 07:41:33
|
Hi, > "So I would not in general recommend write caching on a device, unless > you know enough about it to be satisfied that its cache makes it out > to disk on power down. Using the rotational power of the spindle to > generate power to move the head to a special track and flush the cache > is not unheard of for instance." a) You can force unbuffered writes if you need it and there are some systems out there doing that. I'll get some SAN logs the next few hours. b) A harddisk reorders the buffered data before writing it. c) Operating write back caching without battery backup was always a risk. d) Everyone who claims operation enterprise computing without backup systems is a fool. I don't see a problem here. If some filesystems rely on ordered write back behaviour they are crap. You don't get ordered write back on today's harddisks, that's fact (ask Hitachi, Seagate, Maxtor and so on). And no professional sysadmin could operate his storages write through, the performance pentalty is just too big. To be really safe, be sure to turn off all write buffers anywhere in the io process. Unless IET is strictly write through it will never play in the league of enterprise computing. You have to offer features of typical SCSI drives and this means, offering unbuffered and buffered writes on request. bye Carl -- Geschenkt: 3 Monate GMX ProMail + 3 Ausgaben der TV Movie mit DVD ++++ Jetzt anmelden und testen http://www.gmx.net/de/go/mail ++++ |
From: Libor V. <li...@co...> - 2004-10-27 10:32:30
Attachments:
libor.vcf
|
>Unless IET is strictly write through it will never play in the league of >enterprise computing. You have to offer features of typical SCSI drives and >this means, offering unbuffered and buffered writes on request. > > Question is, how does this "unbuffered/buffered write request" look like (which SCSI command?) and if we should use by default buffered (Carl approach?) or unbuffered (Fujita approach?) -- Libor Vanek +-------------------------------------+ | Email: li...@co... | | ICQ: 124529939 | | WWW: http://www.discobolos.net | | Tel/fax: +420 541 22 5091, 6293 | | Mobil: +420 777 703 642 | +-------------------------------------+ |
From: Carl R. <car...@gm...> - 2004-10-27 10:47:47
|
Hi, > Question is, how does this "unbuffered/buffered write request" look like > (which SCSI command?) and if we should use by default buffered (Carl > approach?) or unbuffered (Fujita approach?) For a untagged queueing SCSI device it's really simple: Let the initiator choose if he wants to write buffered (simple SCSI WRITE), write unbuffered (SCSI WRITE with FUA bit set) or flushing the cache (SYNCHRONIZE CACHE). Typical access pattern found in my SAN traces are: Writing buffered and synchronize a particular range (SYNCHRONIZE CACHE supports syncing a range only). This is common between the traced OS/filesystems. As long as IET doesn't reorder the SCSI commands received there's AFAIK no problem. If you're using initiators not supporting the neccessary SCSI command set or using journaling filesystems which aren't using these features you have to set write through, else you can safely use write back. BTW: Newer 2.6 kernels are supporting write barriers for reiser & ext3. bye Carl -- NEU +++ DSL Komplett von GMX +++ http://www.gmx.net/de/go/dsl GMX DSL-Netzanschluss + Tarif zum supergünstigen Komplett-Preis! |
From: FUJITA T. <to...@ac...> - 2004-10-27 11:07:26
|
From: "Carl Rueder" <car...@gm...> Subject: Re: [Iscsitarget-devel] Write caching - subjective summary Date: Wed, 27 Oct 2004 09:41:26 +0200 (MEST) > > "So I would not in general recommend write caching on a device, unless > > you know enough about it to be satisfied that its cache makes it out > > to disk on power down. Using the rotational power of the spindle to > > generate power to move the head to a special track and flush the cache > > is not unheard of for instance." > > a) You can force unbuffered writes if you need it and there are some systems > out there doing that. I'll get some SAN logs the next few hours. > b) A harddisk reorders the buffered data before writing it. > c) Operating write back caching without battery backup was always a risk. d) > Everyone who claims operation enterprise computing without backup systems is > a fool. With write back caching without battery backup, you may lose some of your data. With write back caching with reordering without battery backup, you may lose the whole your file system. Please read Stephen Lord's mail more carefully. He said that write back may corrupt file system metadata. Note that he said that write-back may corrupt file system metadata, however, write-back preserves the ordering is safe for modern journaling file systems, I think. > I don't see a problem here. If some filesystems rely on ordered write back > behaviour they are crap. You don't get ordered write back on today's > harddisks, that's fact (ask Hitachi, Seagate, Maxtor and so on). And no > professional sysadmin could operate his storages write through, the > performance pentalty is just too big. Why is it so difficult for you to understand the fact that some journaling file systems in Linux don't support write back caching with reordering? Please don't insult file system developers by saying that they are crap. Your theory says that ext3 was crap (I showed you Stephen C. Tweedie's mail) and XFS was crap (I showed you Stephen Lord's mail). Note that you thought that XFS is designed to handle write-back with reorder. You don't know anything about file systems and Linux I/O design. Why are you so rude? And you said, XFS is more mature than ext2/ext3. AFAIK XFS was designed on journaling and not like ext3 on a already existing filesystem. Probably the designers have taken the caching issues into design. Again, how could you said such thing even though you don't know anything about XFS and ext3 implementations in Linux. I think that you have some experience and read white papers about these file systems, however, how can you be sure that they are true? You can find truth only in the source code. Don't insult or criticize people by only your assumption. As I showed, you know no truth about file system implementations. > Unless IET is strictly write through it will never play in the league of > enterprise computing. You have to offer features of typical SCSI drives and > this means, offering unbuffered and buffered writes on request. You don't need to say "You have to do something.". You have to do what you want for yourself. If IET is not useful for you, please go away and find another solution. |
From: Carl R. <car...@gm...> - 2004-10-27 12:35:23
|
> Please read Stephen Lord's mail more carefully. He said that > write back may corrupt file system metadata. Did I ever ignored this? > Note that he said that write-back may corrupt file system metadata, > however, write-back preserves the ordering is safe for modern > journaling file systems, I think. That's the reason why some people implemented write barriers for io & filesystems (see the current kernel 2.6). To ensure transaction isolation on IO. > Why is it so difficult for you to understand the fact that some > journaling file systems in Linux don't support write back caching with > reordering? Why is it so difficult for you to see that there are some other systems beyond Linux doing the things right? You want to provide a target, not an initiator! What are the facts (from my point of view): 1) Some filesystem are having problems with reordering 2) Some filesystems aren't having any problems with reordering 3) Harddisks are doing reordering 4) Write through cuts real database workload performance down to 20% > Please don't insult file system developers by saying that they are > crap. Your theory says that ext3 was crap (I showed you Stephen > C. Tweedie's mail) and XFS was crap (I showed you Stephen Lord's > mail). Note that you thought that XFS is designed to handle write-back > with reorder. If a filesystem has a problem with write caching it's imho buggy. That's my opinion. Why was write barrier support developed? Why not just forcing write through? > You don't know anything about file systems and Linux I/O design. Why > are you so rude? Why am I rude? Why are you so focused on only ext3 and Linux? > Again, how could you said such thing even though you don't know > anything about XFS and ext3 implementations in Linux. Sure, reordering was an issue, but now there are solutions. It won't improve IET any further if we dispute on past issues. > I think that you have some experience and read white papers about > these file systems, however, how can you be sure that they are > true? You can find truth only in the source code. Why do I have to take care only for these filesystems? Why should I ignore other implementations? > Don't insult or criticize people by only your assumption. As I showed, > you know no truth about file system implementations. I assume I know as much about linux filesystems as you know about commercial systems. I don't claim to know everything, do you? > You don't need to say "You have to do something.". You have to do what > you want for yourself. Unfortunately I dont have too much programming experience on kernel layer but I have some experiences on storage, storage area networks, databases and other commercial systems used for handling real workload. But if you don't need such experience I won't bother here anymore. > If IET is not useful for you, please go away and find another solution. Isn't your objective the development of an "iSCSI target with professional features, that works well in enterprise environment under real workload, and is scalable and versatile enough to meet the challenge of future storage needs and developements" anymore? I only wanted a constructive discussion about improvements to write caching but your only arguments were the handicaps of some Linux filesystems. If you don't want to get feedback from users who are having experience in real workload systems, ok, no problem. But don't expect to play in the enterprise league. Your highest scoring will be the toys club. Just my few cents. Bye Carl -- Geschenkt: 3 Monate GMX ProMail + 3 Ausgaben der TV Movie mit DVD ++++ Jetzt anmelden und testen http://www.gmx.net/de/go/mail ++++ |
From: Libor V. <li...@co...> - 2004-10-27 12:56:17
Attachments:
libor.vcf
|
Hey, just calm down. Why not provide "write caching" as option (by default disabled) to those who knows that this can corrupt data? -- Libor Vanek +-------------------------------------+ | Email: li...@co... | | ICQ: 124529939 | | WWW: http://www.discobolos.net | | Tel/fax: +420 541 22 5091, 6293 | | Mobil: +420 777 703 642 | +-------------------------------------+ |
From: Carl R. <car...@gm...> - 2004-10-27 13:08:13
|
> Hey, just calm down. Why not provide "write caching" as option (by > default disabled) to those who knows that this can corrupt data? Strictly write through isn't a solution, strictly write back isn't a solution, too. bye Carl -- NEU +++ DSL Komplett von GMX +++ http://www.gmx.net/de/go/dsl GMX DSL-Netzanschluss + Tarif zum supergünstigen Komplett-Preis! |
From: FUJITA T. <to...@ac...> - 2004-10-27 14:48:08
|
From: Ming Zhang <mi...@el...> Subject: Re: [Iscsitarget-devel] Write caching - subjective summary Date: Wed, 27 Oct 2004 10:11:41 -0400 > i agree with Libor's suggestion. > > 1) make write back as an option here, as a patch, not be merged. > 2) try to implement support for WRITE FUA, SYNCHRONIZE_CACHE (this is > useful for both write through and write back) > 3) pursue a true write back solution with order preserve. or extra write > redo log. implement a new io handler. Tomorrow (maybe day after tomorrow), I'll post reasons why it is to implement write-back, which disk drives provide, though I've already said some here. And I'll also suggest a possible way to implement it. If it is implemented properly and you guys want, merging it is OK for me. But I don't think that it is easy to implement it. And as I said, I'm not interested in implementing this so much. I prefer other jobs like the new I/O design. And the new I/O design make it easier for you to work independently. For example, you can implement the virtual tape library without modifications to the core code. So I think that I should do it first. |
From: Ming Z. <mi...@el...> - 2004-10-27 15:10:10
|
On Wed, 2004-10-27 at 10:48, FUJITA Tomonori wrote: > From: Ming Zhang <mi...@el...> > Subject: Re: [Iscsitarget-devel] Write caching - subjective summary > Date: Wed, 27 Oct 2004 10:11:41 -0400 > > > i agree with Libor's suggestion. > > > > 1) make write back as an option here, as a patch, not be merged. > > 2) try to implement support for WRITE FUA, SYNCHRONIZE_CACHE (this is > > useful for both write through and write back) > > 3) pursue a true write back solution with order preserve. or extra write > > redo log. implement a new io handler. > > Tomorrow (maybe day after tomorrow), I'll post reasons why it is to > implement write-back, which disk drives provide, though I've already > said some here. And I'll also suggest a possible way to implement it. ok. looking forward to seeing. > > If it is implemented properly and you guys want, merging it is OK for > me. But I don't think that it is easy to implement it. And as I said, > I'm not interested in implementing this so much. > > I prefer other jobs like the new I/O design. And the new I/O design > make it easier for you to work independently. For example, you can > implement the virtual tape library without modifications to the core > code. So I think that I should do it first. > i vote on this. if we can get this done first. then libor can work on VT without modifying others. i may work on a bypass interface to allow use system real scsi device directly. > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Sybase ASE Linux Express Edition - download now for FREE > LinuxWorld Reader's Choice Award Winner for best database on Linux. > http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click > _______________________________________________ > Iscsitarget-devel mailing list > Isc...@li... > https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel -- -------------------------------------------------- | Ming Zhang, PhD. Student | Dept. of Electrical & Computer Engineering | College of Engineering | University of Rhode Island | Kingston RI. 02881 | e-mail: mingz at ele.uri.edu | Tel. (401) 874-2293 | Fax. (401) 782-6422 | http://www.ele.uri.edu/~mingz/ | http://crab.ele.uri.edu/gallery/albums.php -------------------------------------------------- |
From: FUJITA T. <to...@ac...> - 2004-10-28 06:46:11
|
From: FUJITA Tomonori <to...@ac...> Subject: Re: [Iscsitarget-devel] Write caching - subjective summary Date: Wed, 27 Oct 2004 23:48:04 +0900 > > 1) make write back as an option here, as a patch, not be merged. > > 2) try to implement support for WRITE FUA, SYNCHRONIZE_CACHE (this is > > useful for both write through and write back) > > 3) pursue a true write back solution with order preserve. or extra write > > redo log. implement a new io handler. > > Tomorrow (maybe day after tomorrow), I'll post reasons why it is to > implement write-back, which disk drives provide, though I've already > said some here. And I'll also suggest a possible way to implement it. Let's see what we can do for implementing write-back caching, which commodity disk drives provide. Before explanation, let me summarize the points in the lengthy discussion. File systems are referred to file systems providing metadata integrity techniques, like journaling or Soft Updates. o File systems can work well with disk drives with write-back caching, which can flush the cache in the presence of a system crash. Preserving the write ordering does not matter (because they work in the exact same way as traditional disk drives without disk cache). These disk drives are always safe, so I'm taking about them any more. o If disk drives with write-back caching cannot flush the cache in the presence of a system crash, file systems need them to preserve the write ordering (Possibly, some file systems cannot handle such disk drives. But I guess that most of them or all can. At least, ext3 can do.). o If disk drives with write-back caching cannot flush the cache in the presence of a system crash or reserve the write ordering, they may destroy some file systems, like XFS in Linux (if I understand the code correctly). However, some file system like ext3 can handle them and get more performance. I refer the second cache policy to write-back and the last type to write-back-reorder. It seems that there are disk drives supporting only write-through and write-back-reorder in the market (it is only useful information me during the discussion). So, I'll discuss the details in the next e-mails. # Write-back-reorder are bad for file system developers, since it makes # their work harder. Some of them still are sceptical about write-back # cache, I think. For example, several developers say that some ATA # disk drives simply ignore the order to flush disk cache. |
From: Ming Z. <mi...@el...> - 2004-10-27 14:12:05
|
yes, pls calm down. i agree with Libor's suggestion. 1) make write back as an option here, as a patch, not be merged. 2) try to implement support for WRITE FUA, SYNCHRONIZE_CACHE (this is useful for both write through and write back) 3) pursue a true write back solution with order preserve. or extra write redo log. implement a new io handler. 4)... ming On Wed, 2004-10-27 at 09:08, Carl Rueder wrote: > > Hey, just calm down. Why not provide "write caching" as option (by > > default disabled) to those who knows that this can corrupt data? > > Strictly write through isn't a solution, strictly write back isn't a > solution, too. > > bye > Carl |
From: Carl R. <car...@gm...> - 2004-10-27 14:42:12
|
Why 3)? > yes, pls calm down. > > i agree with Libor's suggestion. > > 1) make write back as an option here, as a patch, not be merged. > 2) try to implement support for WRITE FUA, SYNCHRONIZE_CACHE (this is > useful for both write through and write back) > 3) pursue a true write back solution with order preserve. or extra write > redo log. implement a new io handler. > 4)... > > > ming > > > > On Wed, 2004-10-27 at 09:08, Carl Rueder wrote: > > > Hey, just calm down. Why not provide "write caching" as option (by > > > default disabled) to those who knows that this can corrupt data? > > > > Strictly write through isn't a solution, strictly write back isn't a > > solution, too. > > > > bye > > Carl > -- NEU +++ DSL Komplett von GMX +++ http://www.gmx.net/de/go/dsl GMX DSL-Netzanschluss + Tarif zum supergünstigen Komplett-Preis! |
From: Ming Z. <mi...@el...> - 2004-10-27 14:49:21
|
3 is useful for trying to recovery some data after crash, same reason as why people need journalling fs. On Wed, 2004-10-27 at 10:41, Carl Rueder wrote: > Why 3)? > > > yes, pls calm down. > > > > i agree with Libor's suggestion. > > > > 1) make write back as an option here, as a patch, not be merged. > > 2) try to implement support for WRITE FUA, SYNCHRONIZE_CACHE (this is > > useful for both write through and write back) > > 3) pursue a true write back solution with order preserve. or extra write > > redo log. implement a new io handler. > > 4)... > > > > > > ming > > > > > > > > On Wed, 2004-10-27 at 09:08, Carl Rueder wrote: > > > > Hey, just calm down. Why not provide "write caching" as option (by > > > > default disabled) to those who knows that this can corrupt data? > > > > > > Strictly write through isn't a solution, strictly write back isn't a > > > solution, too. > > > > > > bye > > > Carl > > -- -------------------------------------------------- | Ming Zhang, PhD. Student | Dept. of Electrical & Computer Engineering | College of Engineering | University of Rhode Island | Kingston RI. 02881 | e-mail: mingz at ele.uri.edu | Tel. (401) 874-2293 | Fax. (401) 782-6422 | http://www.ele.uri.edu/~mingz/ | http://crab.ele.uri.edu/gallery/albums.php -------------------------------------------------- |
From: FUJITA T. <to...@ac...> - 2004-10-28 08:28:06
|
From: FUJITA Tomonori <to...@ac...> Subject: Re: [Iscsitarget-devel] Write caching - subjective summary Date: Wed, 27 Oct 2004 23:48:04 +0900 > > 1) make write back as an option here, as a patch, not be merged. > > 2) try to implement support for WRITE FUA, SYNCHRONIZE_CACHE (this is > > useful for both write through and write back) > > 3) pursue a true write back solution with order preserve. or extra write > > redo log. implement a new io handler. > > Tomorrow (maybe day after tomorrow), I'll post reasons why it is to > implement write-back, which disk drives provide, though I've already > said some here. And I'll also suggest a possible way to implement it. Let's discuss write-back-reorder cache first, because it is easier than write-back. 1. fileio If we'll go with the fileio, we can implement write-back-reorder cache with one-line change to fileio_sync(), as we saw before. With the change, when IET receives and copy data to page cache, it tells an initiator that the write command finishes. And the dirty page cache will be written to disk later by someone like the bdflush daemon. And the writing order is aggressively changed. This looks like write-back-reorder cache, which disk drives provide, however, it far more harmful for file systems. I think that there are many people use the combination of write-back-reorder cache and file systems (like XFS) cannot handle it properly. But most of them have not found that their file systems were corrupted. This is because the amount of disk cache (write-back-reorder) is very small (typically several megabytes). So there is little possibility that your file systems are corrupted badly. However, IET with the change uses the huge amount of memory as disk cache. With 2.4 kernels, you can dirty almost all of page cache. The large amount of clean page cache (for reading) is useful. But if 800 MB dirty page cache is lost due to a crash, probably your file system cannot survive. Another problem about the change is that dirty page cache are kept for a longer time, I think. I'm not sure how disk cache works, however, I guess that dirty disk cache cannot be kept for 30 seconds, unlike Linux kernel does. Now you understand that the change to fileio_sync() is far more dangerous than write-back-reorder cache, which your disk drives use. Bad news is that it is impossible to solve this problem. Limiting the amount of dirty page cache needs modifications to Linux kernels, and it's unacceptable. We need to more freedom to control the behavior of page cache (i.e. the VM system) per devices, like kind of QoS. Possibly such features will be implemented to Linux kernels because some people think that such features are important for clients (initiators) in IP-SAN. Now we have no choice but to accept the riskier write-back-reorder cache. I don't like it because I feel like I'm going to bring more dangerous products than similar one in the market. However, I also understand that you are responsible for what you do and we are not be responsible for how they use. So if you guys like write-back-reorder cache, it is totally OK and will be implemented. Secondly, we need to implement SCSI tag attributes, that is, ordered, head of queue, and aca. Now IET ignores them. Implementing only ordering tag, which is important for file systems, is not difficult. However, treating all tags correctly is not so easy, I guess. I always feel that I have to implement this. Note that file systems, which can handle write-back cache, use ordered tags, though IET ignores it now. This may break your file system or not. It depends on how file systems use ordered tags. I know that ext3 can, but I'm not sure about others. Thirdly, we need SYNCHRONIZE CACHE command. It is easy to implemnt it. Fourthly, we need tree-structured configuration files, as we discussed. With the current format, if we put per-lun things, it will be messy. Maybe, we also need FUA bit support, though I'm not sure that file systems use it. 2. blockio I'm too tired to write more. Generally speaking, without modifications to Linux kernels, it can completely control how to write. But it needs lots of work to implement this. If someone is interested in this approach, I'll explain more. |
From: FUJITA T. <to...@ac...> - 2004-10-28 08:35:39
|
From: FUJITA Tomonori <to...@ac...> Subject: Re: [Iscsitarget-devel] Write caching - subjective summary Date: Thu, 28 Oct 2004 17:28:03 +0900 > Note that file systems, which can handle write-back cache, use ordered > tags, though IET ignores it now. This may break your file system or > not. It depends on how file systems use ordered tags. I know that ext3 > can, but I'm not sure about others. Sorry for typo. I know that ext3 is OK with IET. |
From: FUJITA T. <to...@ac...> - 2004-10-27 14:48:03
|
From: "Carl Rueder" <car...@gm...> Subject: Re: [Iscsitarget-devel] Write caching - subjective summary Date: Wed, 27 Oct 2004 14:35:16 +0200 (MEST) > > Please read Stephen Lord's mail more carefully. He said that > > write back may corrupt file system metadata. > > Did I ever ignored this? You said, > c) Operating write back caching without battery backup was always a > risk. Modern journaling file systems are safe with write-back without reorder. FYI, you want to protect not only metadata but also file-data, use a journaling file systems which logs everything. > > Note that he said that write-back may corrupt file system metadata, > > however, write-back preserves the ordering is safe for modern > > journaling file systems, I think. > > That's the reason why some people implemented write barriers for io & > filesystems (see the current kernel 2.6). To ensure transaction isolation on > IO. Have you read my mails? I said that ext3 uses BIO barrier to work with write-back with reordering and it was done this summer. And I also said that reiserfs can handle write-back with reorder. And I know how Linux I/O architecture has evolved. You don't need to teach me such stuff. > > Please don't insult file system developers by saying that they are > > crap. Your theory says that ext3 was crap (I showed you Stephen > > C. Tweedie's mail) and XFS was crap (I showed you Stephen Lord's > > mail). Note that you thought that XFS is designed to handle write-back > > with reorder. > > If a filesystem has a problem with write caching it's imho buggy. That's my So you think that XFS in Linux is buggy (at least, it was buggy in the past due to the developer's mail). And you thought that it is designed for enterprise use. > opinion. Why was write barrier support developed? Why not just forcing write > through? If a disk drive, that uses write-back with reorder, cannot flush disk cache in the presence of a system crash, it is evil for file systems. However, as I said, some file system developers started to work on this issue. That's why write barrier support was developed. But it takes a long time. You quoted Linus's mail (2001) and some of file systems in Linux have supported this since 2004. So you call the file systems, which cannot handle write-back issue, crap, it's wrong. > > You don't know anything about file systems and Linux I/O design. Why > > are you so rude? > > Why am I rude? Why are you so focused on only ext3 and Linux? How do you think about people who say that something is crap without knowledge about it? How do you think about people who critisize developers without knowledge about their work? If they are not rude, who is rude? And when I said that some file systems in Linux, like XFS and JFS, can't handle "write back with reorder", You replied, "I just won't believe that current file systems are such a misdesign." even though you don't know anything about file system implementations in Linux. So I looked for XFS developer's mail saying that write-back cache may destroy your file system. Why did I do such things to correct you? You should have done this for yourself. Isn't it rude to deny someone's opinion without any knowledge and waste his time to correct your misunderstanding? Note that it is possible that XFS developers had worked this issue. So please correct me by asking the developers if I'm wrong. > > Don't insult or criticize people by only your assumption. As I > > showed, you know no truth about file system implementations. > > I assume I know as much about linux filesystems as you know about > commercial systems. I don't claim to know everything, do you? I don't know anything about commercial systems, so did I criticize them by calling them crap like you? > > You don't need to say "You have to do something.". You have to do what > > you want for yourself. > > Unfortunately I dont have too much programming experience on kernel layer > but I have some experiences on storage, storage area networks, databases and > other commercial systems used for handling real workload. But if you don't > need such experience I won't bother here anymore. Please don't bother me any more. > > If IET is not useful for you, please go away and find another solution. > > Isn't your objective the development of an "iSCSI target with professional > features, that works well in enterprise environment under real workload, and > is scalable and versatile enough to meet the challenge of future storage > needs and developements" anymore? > > I only wanted a constructive discussion about improvements to write caching I don't think so. For example, I explained one of reasons why it is difficult to achieve write-back policy in IET, which disk drives provide. I don't think that your reply was constructive. I love to listen to users' opinions and collaborate with people. However, I don't think that I can work with you constructively, though I'm not sure about how other people feel. All I can say is that please go away and find another solution. |