Thread: [Iscsitarget-devel] Write caching - subjective summary

iscsitarget-devel

[Iscsitarget-devel] Write caching - subjective summary

From: Libor V. <li...@co...> - 2004-10-26 22:10:25

Hi,
I'm reading the discussion about write caching and I'd like to post my 
VERY subjective summary:

Main problem:
- without write caching is perfance really bad
- with write caching you can very easily corrupt your data

Data order:
- as no HDD today guarantees order of requests written to disk (really 
true?) we should't take care...

My very own experiences:
- I run XFS filesystem on >300 Linux servers on SW RAID5
- administrators are doing REALLY crazy things to system
- XFS is VERY stable - in fact I haven't seen any filesystem corrupted 
because incorect shutdown - when I was using ext2/3, I've seen lot of them!
- system MUST be using write caching (I can see very big difference 
(~50%) in write performance if I use disks with 2 or 8 MB cache) => 
filesystems MUST solve this write cache problem now, no matter if they 
have iSCSI or SATA or HW/SW RAID as "storage backend"

Summary:
- every modern system (OS + fs) must be able to deal with loss of data 
in write cache
- but how???

Stupid question:
- imagine this setup - SW RAID5 (SATA), XFS filesystem, file mounted as 
loopback device, iSCSI using this loopback device
- in this setup surely ALREADY is some write caching (at XFS/SW RAID 
levels), isn't it?

I'm using such a system on 5 "semi-production" servers with Windows 
servers as clients and I haven't seen any problems with data 
corruption... (but I haven't done any tests focused on that!)

Thanks for any comments.

-- 
Libor Vanek

+-------------------------------------+
| Email:    li...@co...            |
| ICQ:      124529939                 |
| WWW:      http://www.discobolos.net |
| Tel/fax:  +420 541 22 5091, 6293    |
| Mobil:    +420 777 703 642          |
+-------------------------------------+

Re: [Iscsitarget-devel] Write caching - subjective summary

From: Carl R. <car...@gm...> - 2004-10-26 22:33:44

Hi,

> Main problem:
> - without write caching is perfance really bad

Only random writes are slow, sequential writes are not that problem.

> - with write caching you can very easily corrupt your data

Unless you cannot control the caching, yes.

> Data order:
> - as no HDD today guarantees order of requests written to disk (really 
> true?) we should't take care...

Every harddisk with write caching reorders the data to adapt the physical 
circumstances.

> My very own experiences:
> - I run XFS filesystem on >300 Linux servers on SW RAID5
> - administrators are doing REALLY crazy things to system
> - XFS is VERY stable - in fact I haven't seen any filesystem corrupted 
> because incorect shutdown - when I was using ext2/3, I've seen lot of 
> them!

XFS is more mature than ext2/ext3. AFAIK XFS was designed on journaling and 
not like ext3 on a already existing filesystem. Probably the designers have 
taken the caching issues into design.

> - system MUST be using write caching (I can see very big difference (~50%) 
> in write performance if I use disks with 2 or 8 MB cache) => filesystems 
> MUST solve this write cache problem now, no matter if they have iSCSI or 
> SATA or HW/SW RAID as "storage backend"

True. And I know some commercial filesystems have solved this.

> Summary:
> - every modern system (OS + fs) must be able to deal with loss of data in 
> write cache
> - but how???

The filesystem has to take care that all critical data is really written to 
disk and the OS has to provide the necessary interface.

> Stupid question:
> - imagine this setup - SW RAID5 (SATA), XFS filesystem, file mounted as 
> loopback device, iSCSI using this loopback device
> - in this setup surely ALREADY is some write caching (at XFS/SW RAID 
> levels), isn't it?

AFAIK, the writes through IET are going write through into the loopback 
file. I don't know if XFS provides any further caching. I used LVM instead 
of a loopback file. But I'll try this, too.

bye
Carl

Re: [Iscsitarget-devel] Write caching - subjective summary

From: Libor V. <li...@co...> - 2004-10-26 22:40:14

>> Data order:
>> - as no HDD today guarantees order of requests written to disk 
>> (really true?) we should't take care...
>
> Every harddisk with write caching reorders the data to adapt the 
> physical circumstances.
>
Are there any HDD without write caching?

>> Summary:
>> - every modern system (OS + fs) must be able to deal with loss of 
>> data in write cache
>> - but how???
>
> The filesystem has to take care that all critical data is really 
> written to disk and the OS has to provide the necessary interface.

So - what is this interface and how does it "propagate" this "sync" 
(probably) action to iSCSI/SCSI/SATA/SW RAID block devices?

-- 

Libor Vanek


+-------------------------------------+
| Email:    li...@co...            |
| ICQ:      124529939                 |
| WWW:      http://www.discobolos.net |
| Tel/fax:  +420 541 22 5091, 6293    |
| Mobil:    +420 777 703 642          |
+-------------------------------------+

Re: [Iscsitarget-devel] Write caching - subjective summary

From: Carl R. <car...@gm...> - 2004-10-26 22:53:13

Hi,

>> Every harddisk with write caching reorders the data to adapt the physical 
>> circumstances.
> Are there any HDD without write caching?

You can always turn off write caching. But I don't think that there are disk 
without write cache out on the market.

>> The filesystem has to take care that all critical data is really written 
>> to disk and the OS has to provide the necessary interface.
> So - what is this interface and how does it "propagate" this "sync" 
> (probably) action to iSCSI/SCSI/SATA/SW RAID block devices?

The device driver must provide the necessary functions, the filesystem must 
use them. As I said before, I don't know the linux IO system very well, but 
for sure there's a kind of unifed interfaces so that you don't have to cover 
each storage technology within the filesystem. I think there's something in 
the block device layer but I have to investigate first.

bye
Carl

Re: [Iscsitarget-devel] Write caching - subjective summary

From: FUJITA T. <to...@ac...> - 2004-10-27 01:39:41

From: "Carl Rueder" <car...@gm...>
Subject: Re: [Iscsitarget-devel] Write caching - subjective summary
Date: Wed, 27 Oct 2004 00:33:36 +0200

> > My very own experiences:
> > - I run XFS filesystem on >300 Linux servers on SW RAID5
> > - administrators are doing REALLY crazy things to system
> > - XFS is VERY stable - in fact I haven't seen any filesystem corrupted 
> > because incorect shutdown - when I was using ext2/3, I've seen lot of 
> > them!
> 
> XFS is more mature than ext2/ext3. AFAIK XFS was designed on journaling and 
> not like ext3 on a already existing filesystem. Probably the designers have 
> taken the caching issues into design.

I quato Stephen Lord's mail in the XFS mailing list. He is a SGI XFS developer.

http://oss.sgi.com/archives/linux-xfs/2002-11/msg00057.html

--
On Tue, 2002-11-05 at 13:50, Christophe Zwecker wrote:

> Hi,
>
> im running gentoo with EVMS/LVM on 2.4.19 and XFS on a loop-aes
> crypted device 430GB.
>
> I have write cache disabled on the 3ware controller. I copied 300 gb
> over network, I noticed that every 50mb or so, the controller
> stalled, didnt accept more data while writing like crazy to disk.
>
> it took me 24 h to copy 300 gb over 100mbit network. After that I
> tried to turn write cache on, huge more performance. hmm well. I rebooted
> couple of times, suddenley I could mount ther XFS partition any more
> (bad superblock). I disabled write cache again and thank god I could
> fix the issue with xfs_repair.
>
> So, here I am , disbled write cache, ok. the performance as I stated
> above is awfull, is this how its supposed to be ? disable write cache
> and have terrible performance?

Unfortunately, some hardware is just designed to rely on write
caching. It does seem a little odd that you lost the superblock
though. This is put there by mkfs, and is always present, so how
powering down the device is corrupting it I do not know. That
seems like an issue with the 3ware firmware to me.

--

He meant that hardware designed to rely on write caching (write back)
is broken. XFS was not able to handle it in 2002-11. And after a quick
look at the code, I don't think that the current code can handle
it. So please ask XFS developers to correct me.

If you use hardware 'write back with reorder', it need a battery to
write all disk cache in the presence of a system crash.

Please ask experts if you believe that file systems in Linux can
handle 'write back with reorder' instead of repeating that I don't
know Linux I/O layer design. If you think that 'write back with
reorder' is useful for IET, you need to prove that modern file systems
can handle it.

I showed that ext3 was not able to handle it in 2003/02 (at least) by
quoting Stephen C. Tweedies' email. So please look for file system
experts' opinions on this issue for yourself to correct me.

Re: [Iscsitarget-devel] Write caching - subjective summary

From: FUJITA T. <fuj...@la...> - 2004-10-27 06:38:10

From: FUJITA Tomonori <to...@ac...>
Subject: Re: [Iscsitarget-devel] Write caching - subjective summary
Date: Wed, 27 Oct 2004 10:39:34 +0900

> > XFS is more mature than ext2/ext3. AFAIK XFS was designed on journaling and 
> > not like ext3 on a already existing filesystem. Probably the designers have 
> > taken the caching issues into design.
> 
> I quato Stephen Lord's mail in the XFS mailing list. He is a SGI XFS developer.
> 
> http://oss.sgi.com/archives/linux-xfs/2002-11/msg00057.html

I found his more informative quotation.

http://oss.sgi.com/archives/linux-xfs/2001-07/msg00246.html

--
> I've asked about write caching before, and now I've got another
> question.
> 
> If I've got write caching turned on, and the power goes out, do
> I face the possibility of silent metadata corruption?
> 
> With ext2, the fsck would attempt to fix metadata problems on
> bootup; on the other hand, with xfs, if write caching resulted
> in a corrupted log, would the simple log replay that xfs does on
> mount perhaps fail to catch possible metadata problems?

Yes, and the same will be true for all write behind journalled filesystems,
if the log does not make it to disk when the driver says it has then in
theory a subsequent metadata write could make it out to the media. A crash
at this point would mean that part of a transaction had made it to disk,
but the rest which only existed in the disks internal cache would not.

I would expect reiserfs, ext3 and jfs would all have the same issue.

So I would not in general recommend write caching on a device, unless
you know enough about it to be satisfied that its cache makes it out
to disk on power down. Using the rotational power of the spindle to
generate power to move the head to a special track and flush the cache
is not unheard of for instance.
---

If you are serious about your data, don't use 'write back with
reorder' policy with a disk drive that does not have something like a
battery to flush disk cache in the presence of a system crash.

Re: [Iscsitarget-devel] Write caching - subjective summary

From: Carl R. <car...@gm...> - 2004-10-27 07:41:33

Hi,

> "So I would not in general recommend write caching on a device, unless
> you know enough about it to be satisfied that its cache makes it out
> to disk on power down. Using the rotational power of the spindle to
> generate power to move the head to a special track and flush the cache
> is not unheard of for instance."

a) You can force unbuffered writes if you need it and there are some systems
out there doing that. I'll get some SAN logs the next few hours.
b) A harddisk reorders the buffered data before writing it. 
c) Operating write back caching without battery backup was always a risk. d)
Everyone who claims operation enterprise computing without backup systems is
a fool.

I don't see a problem here. If some filesystems rely on ordered write back
behaviour they are crap. You don't get ordered write back on today's
harddisks, that's fact (ask Hitachi, Seagate, Maxtor and so on). And no
professional sysadmin could operate his storages write through, the
performance pentalty is just too big. 

To be really safe, be sure to turn off all write buffers anywhere in the io
process. 

Unless IET is strictly write through it will never play in the league of
enterprise computing. You have to offer features of typical SCSI drives and
this means, offering unbuffered and buffered writes on request. 

bye
Carl

-- 
Geschenkt: 3 Monate GMX ProMail + 3 Ausgaben der TV Movie mit DVD
++++ Jetzt anmelden und testen http://www.gmx.net/de/go/mail ++++

Re: [Iscsitarget-devel] Write caching - subjective summary

From: Libor V. <li...@co...> - 2004-10-27 10:32:30

Attachments: libor.vcf

>Unless IET is strictly write through it will never play in the league of
>enterprise computing. You have to offer features of typical SCSI drives and
>this means, offering unbuffered and buffered writes on request. 
>  
>
Question is, how does this "unbuffered/buffered write request" look like 
(which SCSI command?) and if we should use by default buffered (Carl 
approach?) or unbuffered (Fujita approach?)


-- 

Libor Vanek


+-------------------------------------+
| Email:    li...@co...            |
| ICQ:      124529939                 |
| WWW:      http://www.discobolos.net |
| Tel/fax:  +420 541 22 5091, 6293    |
| Mobil:    +420 777 703 642          |
+-------------------------------------+

Re: [Iscsitarget-devel] Write caching - subjective summary

From: Carl R. <car...@gm...> - 2004-10-27 10:47:47

Hi,

> Question is, how does this "unbuffered/buffered write request" look like 
> (which SCSI command?) and if we should use by default buffered (Carl 
> approach?) or unbuffered (Fujita approach?)

For a untagged queueing SCSI device it's really simple: Let the initiator
choose if he wants to write buffered (simple SCSI WRITE), write unbuffered
(SCSI WRITE with FUA bit set) or flushing the cache (SYNCHRONIZE CACHE).

Typical access pattern found in my SAN traces are: Writing buffered and
synchronize a particular range (SYNCHRONIZE CACHE supports syncing a range
only). This is common between the traced OS/filesystems.

As long as IET doesn't reorder the SCSI commands received there's AFAIK no
problem.

If you're using initiators not supporting the neccessary SCSI command set or
using journaling filesystems which aren't using these features you have to
set write through, else you can safely use write back. 

BTW: Newer 2.6 kernels are supporting write barriers for reiser & ext3.

bye
Carl

-- 
NEU +++ DSL Komplett von GMX +++ http://www.gmx.net/de/go/dsl
GMX DSL-Netzanschluss + Tarif zum supergünstigen Komplett-Preis!

Re: [Iscsitarget-devel] Write caching - subjective summary

From: FUJITA T. <to...@ac...> - 2004-10-27 11:07:26

From: "Carl Rueder" <car...@gm...>
Subject: Re: [Iscsitarget-devel] Write caching - subjective summary
Date: Wed, 27 Oct 2004 09:41:26 +0200 (MEST)

> > "So I would not in general recommend write caching on a device, unless
> > you know enough about it to be satisfied that its cache makes it out
> > to disk on power down. Using the rotational power of the spindle to
> > generate power to move the head to a special track and flush the cache
> > is not unheard of for instance."
> 
> a) You can force unbuffered writes if you need it and there are some systems
> out there doing that. I'll get some SAN logs the next few hours.
> b) A harddisk reorders the buffered data before writing it. 
> c) Operating write back caching without battery backup was always a risk. d)
> Everyone who claims operation enterprise computing without backup systems is
> a fool.

With write back caching without battery backup, you may lose some of
your data.

With write back caching with reordering without battery backup, you
may lose the whole your file system.

Please read Stephen Lord's mail more carefully. He said that
write back may corrupt file system metadata.

Note that he said that write-back may corrupt file system metadata,
however, write-back preserves the ordering is safe for modern
journaling file systems, I think.

> I don't see a problem here. If some filesystems rely on ordered write back
> behaviour they are crap. You don't get ordered write back on today's
> harddisks, that's fact (ask Hitachi, Seagate, Maxtor and so on). And no
> professional sysadmin could operate his storages write through, the
> performance pentalty is just too big. 

Why is it so difficult for you to understand the fact that some
journaling file systems in Linux don't support write back caching with
reordering?

Please don't insult file system developers by saying that they are
crap. Your theory says that ext3 was crap (I showed you Stephen
C. Tweedie's mail) and XFS was crap (I showed you Stephen Lord's
mail). Note that you thought that XFS is designed to handle write-back
with reorder.

You don't know anything about file systems and Linux I/O design. Why
are you so rude?

And you said,

XFS is more mature than ext2/ext3. AFAIK XFS was designed on journaling and 
not like ext3 on a already existing filesystem. Probably the designers have 
taken the caching issues into design.

Again, how could you said such thing even though you don't know
anything about XFS and ext3 implementations in Linux.

I think that you have some experience and read white papers about
these file systems, however, how can you be sure that they are
true? You can find truth only in the source code.

Don't insult or criticize people by only your assumption. As I showed,
you know no truth about file system implementations.

> Unless IET is strictly write through it will never play in the league of
> enterprise computing. You have to offer features of typical SCSI drives and
> this means, offering unbuffered and buffered writes on request. 

You don't need to say "You have to do something.". You have to do what
you want for yourself.

If IET is not useful for you, please go away and find another solution.

Re: [Iscsitarget-devel] Write caching - subjective summary

From: Carl R. <car...@gm...> - 2004-10-27 12:35:23

> Please read Stephen Lord's mail more carefully. He said that
> write back may corrupt file system metadata.

Did I ever ignored this? 

> Note that he said that write-back may corrupt file system metadata,
> however, write-back preserves the ordering is safe for modern
> journaling file systems, I think.

That's the reason why some people implemented write barriers for io &
filesystems (see the current kernel 2.6). To ensure transaction isolation on
IO.

> Why is it so difficult for you to understand the fact that some
> journaling file systems in Linux don't support write back caching with
> reordering?

Why is it so difficult for you to see that there are some other systems
beyond Linux doing the things right? You want to provide a target, not an
initiator!

What are the facts (from my point of view):
1) Some filesystem are having problems with reordering
2) Some filesystems aren't having any problems with reordering
3) Harddisks are doing reordering
4) Write through cuts real database workload performance down to 20% 

> Please don't insult file system developers by saying that they are
> crap. Your theory says that ext3 was crap (I showed you Stephen
> C. Tweedie's mail) and XFS was crap (I showed you Stephen Lord's
> mail). Note that you thought that XFS is designed to handle write-back
> with reorder.

If a filesystem has a problem with write caching it's imho buggy. That's my
opinion. Why was write barrier support developed? Why not just forcing write
through?

> You don't know anything about file systems and Linux I/O design. Why
> are you so rude?

Why am I rude? Why are you so focused on only ext3 and Linux? 

> Again, how could you said such thing even though you don't know
> anything about XFS and ext3 implementations in Linux.

Sure, reordering was an issue, but now there are solutions. It won't improve
IET any further if we dispute on past issues. 

> I think that you have some experience and read white papers about
> these file systems, however, how can you be sure that they are
> true? You can find truth only in the source code.

Why do I have to take care only for these filesystems? Why should I ignore
other implementations? 

> Don't insult or criticize people by only your assumption. As I showed,
> you know no truth about file system implementations.

I assume I know as much about linux filesystems as you know about commercial
systems. I don't claim to know everything, do you?

> You don't need to say "You have to do something.". You have to do what
> you want for yourself.

Unfortunately I dont have too much programming experience on kernel layer
but I have some experiences on storage, storage area networks, databases and
other commercial systems used for handling real workload. But if you don't
need such experience I won't bother here anymore.

> If IET is not useful for you, please go away and find another solution.

Isn't your objective the development of an "iSCSI target with professional
features, that works well in enterprise environment under real workload, and
is scalable and versatile enough to meet the challenge of future storage
needs and developements" anymore?

I only wanted a constructive discussion about improvements to write caching
but your only arguments were the handicaps of some Linux filesystems. If you
don't want to get feedback from users who are having experience in real
workload systems, ok, no problem. But don't expect to play in the enterprise
league. Your highest scoring will be the toys club. Just my few cents.

Bye
Carl

-- 
Geschenkt: 3 Monate GMX ProMail + 3 Ausgaben der TV Movie mit DVD
++++ Jetzt anmelden und testen http://www.gmx.net/de/go/mail ++++

Re: [Iscsitarget-devel] Write caching - subjective summary

From: Libor V. <li...@co...> - 2004-10-27 12:56:17

Attachments: libor.vcf

Hey, just calm down. Why not provide "write caching" as option (by 
default disabled) to those who knows that this can corrupt data?

-- 
Libor Vanek


+-------------------------------------+
| Email:    li...@co...            |
| ICQ:      124529939                 |
| WWW:      http://www.discobolos.net |
| Tel/fax:  +420 541 22 5091, 6293    |
| Mobil:    +420 777 703 642          |
+-------------------------------------+

Re: [Iscsitarget-devel] Write caching - subjective summary

From: Carl R. <car...@gm...> - 2004-10-27 13:08:13

> Hey, just calm down. Why not provide "write caching" as option (by 
> default disabled) to those who knows that this can corrupt data?

Strictly write through isn't a solution, strictly write back isn't a
solution, too. 

bye
Carl

-- 
NEU +++ DSL Komplett von GMX +++ http://www.gmx.net/de/go/dsl
GMX DSL-Netzanschluss + Tarif zum supergünstigen Komplett-Preis!

Re: [Iscsitarget-devel] Write caching - subjective summary

From: FUJITA T. <to...@ac...> - 2004-10-27 14:48:08

From: Ming Zhang <mi...@el...>
Subject: Re: [Iscsitarget-devel] Write caching - subjective summary
Date: Wed, 27 Oct 2004 10:11:41 -0400

> i agree with Libor's suggestion.
> 
> 1) make write back as an option here, as a patch, not be merged.
> 2) try to implement support for WRITE FUA, SYNCHRONIZE_CACHE (this is
> useful for both write through and write back)
> 3) pursue a true write back solution with order preserve. or extra write
> redo log. implement a new io handler.

Tomorrow (maybe day after tomorrow), I'll post reasons why it is to
implement write-back, which disk drives provide, though I've already
said some here. And I'll also suggest a possible way to implement it.

If it is implemented properly and you guys want, merging it is OK for
me. But I don't think that it is easy to implement it. And as I said,
I'm not interested in implementing this so much.

I prefer other jobs like the new I/O design. And the new I/O design
make it easier for you to work independently. For example, you can
implement the virtual tape library without modifications to the core
code. So I think that I should do it first.

Re: [Iscsitarget-devel] Write caching - subjective summary

From: Ming Z. <mi...@el...> - 2004-10-27 15:10:10

On Wed, 2004-10-27 at 10:48, FUJITA Tomonori wrote:
> From: Ming Zhang <mi...@el...>
> Subject: Re: [Iscsitarget-devel] Write caching - subjective summary
> Date: Wed, 27 Oct 2004 10:11:41 -0400
> 
> > i agree with Libor's suggestion.
> > 
> > 1) make write back as an option here, as a patch, not be merged.
> > 2) try to implement support for WRITE FUA, SYNCHRONIZE_CACHE (this is
> > useful for both write through and write back)
> > 3) pursue a true write back solution with order preserve. or extra write
> > redo log. implement a new io handler.
> 
> Tomorrow (maybe day after tomorrow), I'll post reasons why it is to
> implement write-back, which disk drives provide, though I've already
> said some here. And I'll also suggest a possible way to implement it.
ok. looking forward to seeing.

> 
> If it is implemented properly and you guys want, merging it is OK for
> me. But I don't think that it is easy to implement it. And as I said,
> I'm not interested in implementing this so much.
> 
> I prefer other jobs like the new I/O design. And the new I/O design
> make it easier for you to work independently. For example, you can
> implement the virtual tape library without modifications to the core
> code. So I think that I should do it first.
> 
i vote on this. if we can get this done first. then libor can work on VT
without modifying others. i may work on a bypass interface to allow use
system real scsi device directly.

> 
> -------------------------------------------------------
> This SF.Net email is sponsored by:
> Sybase ASE Linux Express Edition - download now for FREE
> LinuxWorld Reader's Choice Award Winner for best database on Linux.
> http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
> _______________________________________________
> Iscsitarget-devel mailing list
> Isc...@li...
> https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel
-- 
 --------------------------------------------------
| Ming Zhang, PhD. Student
| Dept. of Electrical & Computer Engineering
| College of Engineering
| University of Rhode Island
| Kingston RI. 02881
| e-mail: mingz at ele.uri.edu
| Tel. (401) 874-2293 
| Fax. (401) 782-6422
| http://www.ele.uri.edu/~mingz/
| http://crab.ele.uri.edu/gallery/albums.php
 --------------------------------------------------

Re: [Iscsitarget-devel] Write caching - subjective summary

From: FUJITA T. <to...@ac...> - 2004-10-28 06:46:11

From: FUJITA Tomonori <to...@ac...>
Subject: Re: [Iscsitarget-devel] Write caching - subjective summary
Date: Wed, 27 Oct 2004 23:48:04 +0900

> > 1) make write back as an option here, as a patch, not be merged.
> > 2) try to implement support for WRITE FUA, SYNCHRONIZE_CACHE (this is
> > useful for both write through and write back)
> > 3) pursue a true write back solution with order preserve. or extra write
> > redo log. implement a new io handler.
> 
> Tomorrow (maybe day after tomorrow), I'll post reasons why it is to
> implement write-back, which disk drives provide, though I've already
> said some here. And I'll also suggest a possible way to implement it.

Let's see what we can do for implementing write-back caching, which
commodity disk drives provide.

Before explanation, let me summarize the points in the lengthy
discussion.

File systems are referred to file systems providing metadata integrity
techniques, like journaling or Soft Updates.

o File systems can work well with disk drives with write-back caching,
which can flush the cache in the presence of a system
crash. Preserving the write ordering does not matter (because they
work in the exact same way as traditional disk drives without disk
cache). These disk drives are always safe, so I'm taking about them
any more.

o If disk drives with write-back caching cannot flush the cache in the
presence of a system crash, file systems need them to preserve the
write ordering (Possibly, some file systems cannot handle such disk
drives. But I guess that most of them or all can. At least, ext3 can
do.).

o If disk drives with write-back caching cannot flush the cache in the
presence of a system crash or reserve the write ordering, they may
destroy some file systems, like XFS in Linux (if I understand the code
correctly). However, some file system like ext3 can handle them and
get more performance.

I refer the second cache policy to write-back and the last type to
write-back-reorder.

It seems that there are disk drives supporting only write-through and
write-back-reorder in the market (it is only useful information me
during the discussion). So, I'll discuss the details in the next e-mails.

# Write-back-reorder are bad for file system developers, since it makes
# their work harder. Some of them still are sceptical about write-back
# cache, I think. For example, several developers say that some ATA
# disk drives simply ignore the order to flush disk cache.

Re: [Iscsitarget-devel] Write caching - subjective summary

From: Ming Z. <mi...@el...> - 2004-10-27 14:12:05

yes, pls calm down.

i agree with Libor's suggestion.

1) make write back as an option here, as a patch, not be merged.
2) try to implement support for WRITE FUA, SYNCHRONIZE_CACHE (this is
useful for both write through and write back)
3) pursue a true write back solution with order preserve. or extra write
redo log. implement a new io handler.
4)...

ming

On Wed, 2004-10-27 at 09:08, Carl Rueder wrote:
> > Hey, just calm down. Why not provide "write caching" as option (by 
> > default disabled) to those who knows that this can corrupt data?
> 
> Strictly write through isn't a solution, strictly write back isn't a
> solution, too. 
> 
> bye
> Carl

Re: [Iscsitarget-devel] Write caching - subjective summary

From: Carl R. <car...@gm...> - 2004-10-27 14:42:12

Why 3)?

> yes, pls calm down.
> 
> i agree with Libor's suggestion.
> 
> 1) make write back as an option here, as a patch, not be merged.
> 2) try to implement support for WRITE FUA, SYNCHRONIZE_CACHE (this is
> useful for both write through and write back)
> 3) pursue a true write back solution with order preserve. or extra write
> redo log. implement a new io handler.
> 4)...
> 
> 
> ming
> 
> 
> 
> On Wed, 2004-10-27 at 09:08, Carl Rueder wrote:
> > > Hey, just calm down. Why not provide "write caching" as option (by 
> > > default disabled) to those who knows that this can corrupt data?
> > 
> > Strictly write through isn't a solution, strictly write back isn't a
> > solution, too. 
> > 
> > bye
> > Carl
> 

-- 
NEU +++ DSL Komplett von GMX +++ http://www.gmx.net/de/go/dsl
GMX DSL-Netzanschluss + Tarif zum supergünstigen Komplett-Preis!

Re: [Iscsitarget-devel] Write caching - subjective summary

From: Ming Z. <mi...@el...> - 2004-10-27 14:49:21

3 is useful for trying to recovery some data after crash, same reason as
why people need journalling fs.

On Wed, 2004-10-27 at 10:41, Carl Rueder wrote:
> Why 3)?
> 
> > yes, pls calm down.
> > 
> > i agree with Libor's suggestion.
> > 
> > 1) make write back as an option here, as a patch, not be merged.
> > 2) try to implement support for WRITE FUA, SYNCHRONIZE_CACHE (this is
> > useful for both write through and write back)
> > 3) pursue a true write back solution with order preserve. or extra write
> > redo log. implement a new io handler.
> > 4)...
> > 
> > 
> > ming
> > 
> > 
> > 
> > On Wed, 2004-10-27 at 09:08, Carl Rueder wrote:
> > > > Hey, just calm down. Why not provide "write caching" as option (by 
> > > > default disabled) to those who knows that this can corrupt data?
> > > 
> > > Strictly write through isn't a solution, strictly write back isn't a
> > > solution, too. 
> > > 
> > > bye
> > > Carl
> > 
-- 
 --------------------------------------------------
| Ming Zhang, PhD. Student
| Dept. of Electrical & Computer Engineering
| College of Engineering
| University of Rhode Island
| Kingston RI. 02881
| e-mail: mingz at ele.uri.edu
| Tel. (401) 874-2293 
| Fax. (401) 782-6422
| http://www.ele.uri.edu/~mingz/
| http://crab.ele.uri.edu/gallery/albums.php
 --------------------------------------------------

Re: [Iscsitarget-devel] Write caching - subjective summary

From: FUJITA T. <to...@ac...> - 2004-10-28 08:28:06

From: FUJITA Tomonori <to...@ac...>
Subject: Re: [Iscsitarget-devel] Write caching - subjective summary
Date: Wed, 27 Oct 2004 23:48:04 +0900

> > 1) make write back as an option here, as a patch, not be merged.
> > 2) try to implement support for WRITE FUA, SYNCHRONIZE_CACHE (this is
> > useful for both write through and write back)
> > 3) pursue a true write back solution with order preserve. or extra write
> > redo log. implement a new io handler.
> 
> Tomorrow (maybe day after tomorrow), I'll post reasons why it is to
> implement write-back, which disk drives provide, though I've already
> said some here. And I'll also suggest a possible way to implement it.

Let's discuss write-back-reorder cache first, because it is easier
than write-back.

1. fileio

If we'll go with the fileio, we can implement write-back-reorder cache
with one-line change to fileio_sync(), as we saw before.

With the change, when IET receives and copy data to page cache, it
tells an initiator that the write command finishes. And the dirty page
cache will be written to disk later by someone like the bdflush
daemon. And the writing order is aggressively changed.

This looks like write-back-reorder cache, which disk drives provide,
however, it far more harmful for file systems.

I think that there are many people use the combination of
write-back-reorder cache and file systems (like XFS) cannot handle it
properly. But most of them have not found that their file systems were
corrupted. This is because the amount of disk cache
(write-back-reorder) is very small (typically several megabytes). So
there is little possibility that your file systems are corrupted
badly.

However, IET with the change uses the huge amount of memory as disk
cache. With 2.4 kernels, you can dirty almost all of page cache. The
large amount of clean page cache (for reading) is useful. But if 800
MB dirty page cache is lost due to a crash, probably your file system
cannot survive.

Another problem about the change is that dirty page cache are kept for
a longer time, I think. I'm not sure how disk cache works, however, I
guess that dirty disk cache cannot be kept for 30 seconds, unlike
Linux kernel does.

Now you understand that the change to fileio_sync() is far more
dangerous than write-back-reorder cache, which your disk drives use.

Bad news is that it is impossible to solve this problem. Limiting the
amount of dirty page cache needs modifications to Linux kernels, and
it's unacceptable.

We need to more freedom to control the behavior of page cache (i.e.
the VM system) per devices, like kind of QoS. Possibly such features
will be implemented to Linux kernels because some people think that
such features are important for clients (initiators) in IP-SAN.

Now we have no choice but to accept the riskier write-back-reorder
cache. I don't like it because I feel like I'm going to bring more
dangerous products than similar one in the market.

However, I also understand that you are responsible for what you do
and we are not be responsible for how they use. So if you guys like
write-back-reorder cache, it is totally OK and will be implemented.

Secondly, we need to implement SCSI tag attributes, that is, ordered,
head of queue, and aca. Now IET ignores them. Implementing only
ordering tag, which is important for file systems, is not
difficult. However, treating all tags correctly is not so easy, I
guess.

I always feel that I have to implement this.

Note that file systems, which can handle write-back cache, use ordered
tags, though IET ignores it now. This may break your file system or
not. It depends on how file systems use ordered tags. I know that ext3
can, but I'm not sure about others.

Thirdly, we need SYNCHRONIZE CACHE command. It is easy to implemnt it.

Fourthly, we need tree-structured configuration files, as we
discussed. With the current format, if we put per-lun things, it will
be messy.

Maybe, we also need FUA bit support, though I'm not sure that file
systems use it.

2. blockio

I'm too tired to write more. Generally speaking, without modifications
to Linux kernels, it can completely control how to write. But it needs
lots of work to implement this.

If someone is interested in this approach, I'll explain more.

Re: [Iscsitarget-devel] Write caching - subjective summary

From: FUJITA T. <to...@ac...> - 2004-10-28 08:35:39

From: FUJITA Tomonori <to...@ac...>
Subject: Re: [Iscsitarget-devel] Write caching - subjective summary
Date: Thu, 28 Oct 2004 17:28:03 +0900

> Note that file systems, which can handle write-back cache, use ordered
> tags, though IET ignores it now. This may break your file system or
> not. It depends on how file systems use ordered tags. I know that ext3
> can, but I'm not sure about others.

Sorry for typo. I know that ext3 is OK with IET.

Re: [Iscsitarget-devel] Write caching - subjective summary

From: FUJITA T. <to...@ac...> - 2004-10-27 14:48:03

From: "Carl Rueder" <car...@gm...>
Subject: Re: [Iscsitarget-devel] Write caching - subjective summary
Date: Wed, 27 Oct 2004 14:35:16 +0200 (MEST)

> > Please read Stephen Lord's mail more carefully. He said that
> > write back may corrupt file system metadata.
> 
> Did I ever ignored this? 

You said,

> c) Operating write back caching without battery backup was always a
> risk.

Modern journaling file systems are safe with write-back without reorder.

FYI, you want to protect not only metadata but also file-data, use a
journaling file systems which logs everything.

> > Note that he said that write-back may corrupt file system metadata,
> > however, write-back preserves the ordering is safe for modern
> > journaling file systems, I think.
> 
> That's the reason why some people implemented write barriers for io &
> filesystems (see the current kernel 2.6). To ensure transaction isolation on
> IO.

Have you read my mails?

I said that ext3 uses BIO barrier to work with write-back with
reordering and it was done this summer. And I also said that reiserfs
can handle write-back with reorder. And I know how Linux I/O
architecture has evolved.

You don't need to teach me such stuff.

> > Please don't insult file system developers by saying that they are
> > crap. Your theory says that ext3 was crap (I showed you Stephen
> > C. Tweedie's mail) and XFS was crap (I showed you Stephen Lord's
> > mail). Note that you thought that XFS is designed to handle write-back
> > with reorder.
> 
> If a filesystem has a problem with write caching it's imho buggy. That's my

So you think that XFS in Linux is buggy (at least, it was buggy in the
past due to the developer's mail). And you thought that it is designed
for enterprise use.

> opinion. Why was write barrier support developed? Why not just forcing write
> through?

If a disk drive, that uses write-back with reorder, cannot flush disk
cache in the presence of a system crash, it is evil for file systems.

However, as I said, some file system developers started to work on
this issue. That's why write barrier support was developed. But it
takes a long time. You quoted Linus's mail (2001) and some of file
systems in Linux have supported this since 2004.

So you call the file systems, which cannot handle write-back issue,
crap, it's wrong.

> > You don't know anything about file systems and Linux I/O design. Why
> > are you so rude?
> 
> Why am I rude? Why are you so focused on only ext3 and Linux? 

How do you think about people who say that something is crap without
knowledge about it?

How do you think about people who critisize developers without
knowledge about their work?

If they are not rude, who is rude?

And when I said that some file systems in Linux, like XFS and JFS,
can't handle "write back with reorder",

You replied, "I just won't believe that current file systems are such
a misdesign." even though you don't know anything about file system
implementations in Linux. So I looked for XFS developer's mail saying
that write-back cache may destroy your file system. Why did I do
such things to correct you? You should have done this for yourself.

Isn't it rude to deny someone's opinion without any knowledge and
waste his time to correct your misunderstanding?

Note that it is possible that XFS developers had worked this issue. So
please correct me by asking the developers if I'm wrong.

> > Don't insult or criticize people by only your assumption. As I
> > showed, you know no truth about file system implementations.
>
> I assume I know as much about linux filesystems as you know about
> commercial systems. I don't claim to know everything, do you?

I don't know anything about commercial systems, so did I criticize
them by calling them crap like you?

> > You don't need to say "You have to do something.". You have to do what
> > you want for yourself.
> 
> Unfortunately I dont have too much programming experience on kernel layer
> but I have some experiences on storage, storage area networks, databases and
> other commercial systems used for handling real workload. But if you don't
> need such experience I won't bother here anymore.

Please don't bother me any more.

> > If IET is not useful for you, please go away and find another solution.
> 
> Isn't your objective the development of an "iSCSI target with professional
> features, that works well in enterprise environment under real workload, and
> is scalable and versatile enough to meet the challenge of future storage
> needs and developements" anymore?
> 
> I only wanted a constructive discussion about improvements to write caching

I don't think so.

For example, I explained one of reasons why it is difficult to achieve
write-back policy in IET, which disk drives provide. I don't think
that your reply was constructive.

I love to listen to users' opinions and collaborate with
people. However, I don't think that I can work with you
constructively, though I'm not sure about how other people feel.

All I can say is that please go away and find another solution.