|
From: Bryan M. <bry...@nd...> - 2010-06-28 01:49:29
Attachments:
target.log
|
Hello,
Hope everyone has had a good weekend.
I'm currently working on deploying our latest set of fibre
channel targets running linux 2.6.32.15 (32 bit) with SCST trunk
r1787. We are using the qla2x00tgt target driver with the latest
firmware from qlogic's ftp site. The target and initiators are
connected via a fibre channel switch. We are exporting LVM
logical volumes as vdisks with BLOCKIO enabled.
After running heavy IO to the target (mostly writes), I am
seeing the following errors on the target:
### Target Log ###
Jun 27 18:32:30 fc-iacc1-targ6-s kernel: [1833]: dev_vdisk:
blockio_endio:2570:***ERROR***: cmd f44d0f00 returned error -5
Jun 27 18:32:30 fc-iacc1-targ6-s kernel: [1833]: dev_vdisk:
blockio_endio:2570:***ERROR***: cmd f44d0f00 returned error -5
Jun 27 18:32:30 fc-iacc1-targ6-s kernel: [0]:
scst_set_cmd_error_status:793:cmd f44d0f00 already has status 2
set
### End Log ###
See the attached file for the complete log from the target.
Debugging was left enabled when SCST was compiled.
It takes about 10 minutes for the errors to appear in the logs.
The initiators see IO errors and log the following:
### Initiator Log ###
sd 6:0:0:0: SCSI error: return code = 0x08000002
sdc: Current: sense key: Medium Error
Add. Sense: Peripheral device write fault
end_request: I/O error, dev sdc, sector 88519296
raid1: Disk failure on sdc1, disabling device.
Operation continuing on 1 devices
sd 6:0:0:0: SCSI error: return code = 0x08000002
sdc: Current: sense key: Medium Error
Add. Sense: Peripheral device write fault
md: md4: sync done.
end_request: I/O error, dev sdc, sector 88519424
sd 6:0:0:0: SCSI error: return code = 0x08000002
sdc: Current: sense key: Medium Error
Add. Sense: Peripheral device write fault
### End Log ###
NOTE: The initiator uses its SAN volume in a RAID 1 array.
Any suggestions to remedy this problem is greatly
appreciated. I would be up for any testing if needed.
Thanks in advance,
Bryan
|
|
From: Vladislav B. <vs...@vl...> - 2010-06-28 18:53:55
|
Hello, Bryan Mesich, on 06/28/2010 05:03 AM wrote: > Hello, > > Hope everyone has had a good weekend. > > I'm currently working on deploying our latest set of fibre > channel targets running linux 2.6.32.15 (32 bit) with SCST trunk > r1787. We are using the qla2x00tgt target driver with the latest > firmware from qlogic's ftp site. The target and initiators are > connected via a fibre channel switch. We are exporting LVM > logical volumes as vdisks with BLOCKIO enabled. > > After running heavy IO to the target (mostly writes), I am > seeing the following errors on the target: > > ### Target Log ### > Jun 27 18:32:30 fc-iacc1-targ6-s kernel: [1833]: dev_vdisk: > blockio_endio:2570:***ERROR***: cmd f44d0f00 returned error -5 > > Jun 27 18:32:30 fc-iacc1-targ6-s kernel: [1833]: dev_vdisk: > blockio_endio:2570:***ERROR***: cmd f44d0f00 returned error -5 > > Jun 27 18:32:30 fc-iacc1-targ6-s kernel: [0]: > scst_set_cmd_error_status:793:cmd f44d0f00 already has status 2 > set > ### End Log ### I suppose this is a new problem? Was it caused by SCST update, kernel update or both? Looks like the kernel sometimes returns for bios EIO not for real IO errors, but for temporary conditions, like out of memory because of too many pending writes. Obviously, we need to distinguish such conditions somehow. Since you are the first who reported, I guess it must be a recently introduced problem, although, of course, it might be a recent SCST regression as well. I'd suggest you to try with other older and newer kernels. If it doesn't help, I need to ask you to grep in your kernel sources in block/ for all EIO and put near each a separate printk(), then rebuild the new kernel and try with it. By so we would be able to see which exact place triggers this EIO. Thanks, Vlad |
|
From: Bryan M. <bry...@nd...> - 2010-07-14 19:03:33
Attachments:
linux-2.6.32.15-debug.patch
|
On Mon, Jun 28, 2010 at 10:54:14PM +0400, Vladislav Bolkhovitin wrote: > Hello, Hi Vlad, > I suppose this is a new problem? Was it caused by SCST update, kernel > update or both? Yes, this is a new problem. The majority of our systems running SCST are 2.6.27.42 running SCST release 1.0.1.2. I would like to run 2.6.32.x along with a recent commit you made that disables the qla2x00tgt driver from transitioning to "initiator" mode. There are no kernel patches in the release version(s) of SCST that would allow me to patch a 2.6.32 kernel, thus me running into this problem. As a side note, would it be possible to patch a 2.6.32 kernel with a patch from the TRUNK, but compile a release version of SCST against the trunk patched kernel? > Looks like the kernel sometimes returns for bios EIO not for real IO > errors, but for temporary conditions, like out of memory because of too > many pending writes. Obviously, we need to distinguish such conditions > somehow. Since you are the first who reported, I guess it must be a > recently introduced problem, although, of course, it might be a recent > SCST regression as well. > I'd suggest you to try with other older and newer kernels. If it doesn't > help, I need to ask you to grep in your kernel sources in block/ for all > EIO and put near each a separate printk(), then rebuild the new kernel > and try with it. By so we would be able to see which exact place > triggers this EIO. Attached is a patch that contains the printk() modifications I made to the linux kernel source (2.6.32.15). I added printk()'s to all returned EIO's I could find in block/ and also in the md and dm layers. I was able to provoke the error, but it would appear that the problem is not triggered in the same path I made changes to. The reason for my late reply was my lacking of a test bed. I'm using slightly different hardware w/out a fibre channel switch ( target and initiator are in a FC loop topology). The production hardware that initially produced the vdisk error was using 3Ware 9650 SATA controllers, where as the testing hardware is using Intel IHC10 in AHCI mode. This difference to me should rule out the device drivers as being culprit. It would seem that I can only re-produce this error when running RAID 5 on the target. The general layout is: Disks --> RAID5 --> LVM --> VDISK The problem is easy to reproduce by forcing the RAID 5 array to re-sync its members. I usually just fail out one member and add it back into the array. I then generate some IO using dd on the initiator. In fact, just writing out to the partition table on the exported block device is usually enough to provoke the error. I tried provoking the error running with a raw disk and a RAID 1 array without success. At this point, I'm not sure where to go looking. I'd like to chase this problem down if possible. As always, I'm open to more testing. Thanks, Bryan |
|
From: Vladislav B. <vs...@vl...> - 2010-07-15 11:41:00
|
Hi Bryan,
Bryan Mesich, on 07/14/2010 11:03 PM wrote:
> On Mon, Jun 28, 2010 at 10:54:14PM +0400, Vladislav Bolkhovitin wrote:
>> Hello,
>
> Hi Vlad,
>
>> I suppose this is a new problem? Was it caused by SCST update, kernel
>> update or both?
>
> Yes, this is a new problem. The majority of our systems running
> SCST are 2.6.27.42 running SCST release 1.0.1.2. I would like to
> run 2.6.32.x along with a recent commit you made that disables
> the qla2x00tgt driver from transitioning to "initiator" mode.
> There are no kernel patches in the release version(s) of SCST
> that would allow me to patch a 2.6.32 kernel, thus me running
> into this problem.
>
> As a side note, would it be possible to patch a 2.6.32 kernel
> with a patch from the TRUNK, but compile a release version of
> SCST against the trunk patched kernel?
Yes, sure. You can use scripts/generate-kernel-patch from the trunk for
that.
>> Looks like the kernel sometimes returns for bios EIO not for real IO
>> errors, but for temporary conditions, like out of memory because of too
>> many pending writes. Obviously, we need to distinguish such conditions
>> somehow. Since you are the first who reported, I guess it must be a
>> recently introduced problem, although, of course, it might be a recent
>> SCST regression as well.
>
>> I'd suggest you to try with other older and newer kernels. If it doesn't
>> help, I need to ask you to grep in your kernel sources in block/ for all
>> EIO and put near each a separate printk(), then rebuild the new kernel
>> and try with it. By so we would be able to see which exact place
>> triggers this EIO.
>
> Attached is a patch that contains the printk() modifications I
> made to the linux kernel source (2.6.32.15). I added printk()'s
> to all returned EIO's I could find in block/ and also in the md
> and dm layers. I was able to provoke the error, but it would
> appear that the problem is not triggered in the same path I made
> changes to.
>
> The reason for my late reply was my lacking of a test bed. I'm
> using slightly different hardware w/out a fibre channel switch (
> target and initiator are in a FC loop topology). The production
> hardware that initially produced the vdisk error was using 3Ware
> 9650 SATA controllers, where as the testing hardware is using
> Intel IHC10 in AHCI mode. This difference to me should rule out
> the device drivers as being culprit.
>
> It would seem that I can only re-produce this error when running
> RAID 5 on the target. The general layout is:
>
> Disks --> RAID5 --> LVM --> VDISK
>
> The problem is easy to reproduce by forcing the RAID 5 array to
> re-sync its members. I usually just fail out one member and add
> it back into the array. I then generate some IO using dd on the
> initiator. In fact, just writing out to the partition table on
> the exported block device is usually enough to provoke the error.
It, basically, follows my guess that shortage of some resource is
provoking the error. Re-syncing is a very resource consuming operation.
> I tried provoking the error running with a raw disk and a RAID 1
> array without success. At this point, I'm not sure where to go
> looking. I'd like to chase this problem down if possible. As
> always, I'm open to more testing.
It is very important to localize if update of the kernel or SCST is
responsible for the behavior change. So, I'd suggest you try the latest
SCST trunk (please update) with 2.6.27.42, which is known to work well.
Additionally, you should add in scst_vdisk.c::blockio_endio() just after
"PRINT_ERROR("cmd %p returned error %d", blockio_work->cmd, error);"
dump_stack() call. It will give us more info about the failed call stack.
>
> Thanks,
>
> Bryan
|
|
From: Bryan M. <bry...@nd...> - 2010-07-23 19:18:52
Attachments:
stack_dump
|
On Thu, Jul 15, 2010 at 03:40:29PM +0400, Vladislav Bolkhovitin wrote:
Hi Vlad,
[snip...]
>> As a side note, would it be possible to patch a 2.6.32 kernel
>> with a patch from the TRUNK, but compile a release version of
>> SCST against the trunk patched kernel?
>
> Yes, sure. You can use scripts/generate-kernel-patch from the trunk for
> that.
Thanks for the clarification.
[snip...]
> It is very important to localize if update of the kernel or SCST is
> responsible for the behavior change. So, I'd suggest you try the latest
> SCST trunk (please update) with 2.6.27.42, which is known to work well.
Well...I've localized to the problem to be kernel related. My
testing puts the break point between 2.6.31.14 and 2.6.32. I
attempted to bisect the kernel with git, but I'm running into
problems getting SCST to compile against the bisected tree.
In particular, I'm getting implicit declaration warnings for
sync_page_range() in scst_vdisk.c (line 1927).
#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 32)
res = sync_page_range(file->f_dentry->d_inode, file->f_mapping,
loff, len);
#else
#if 0 /* For sparse files we might need to sync metadata as well */
res = generic_write_sync(file, loff, len);
#else
res = filemap_write_and_wait_range(file->f_mapping, loff, len);
#endif
#endif
(BTW, I've been using SCST r1849 for my recent testing)
I'm pushing my technical abilities here, but It would seem that
sync_page_range() has been replaced for kernels > 2.6.32. I
tried hacking around the compile warnings by commenting out
sync_page_range() and forcing the use of
filemap_write_and_wait_range(), but that resulted in a stack
trace and kernel panic when trying to load SCST configuration
(Modules loaded, opening VDISK failed).
> Additionally, you should add in scst_vdisk.c::blockio_endio() just after
> "PRINT_ERROR("cmd %p returned error %d", blockio_work->cmd, error);"
> dump_stack() call. It will give us more info about the failed call stack.
I put the dump_stack() where you suggested, but the resulting
dump didn't seem helpful. I instead put the dump_stack() below
the spinlock just after the above referenced "PRINT_ERROR". This
resulted in a cleaner stack dump (see below).
static DEFINE_SPINLOCK(blockio_endio_lock);
static int error_count = 0;
PRINT_ERROR("cmd %p returned error %d", blockio_work->cmd,
error);
/* To protect from several bios finishing simultaneously */
spin_lock_bh(&blockio_endio_lock);
if ( error_count++ < 5 ) {
printk("Error %d stack trace", error_count + 1);
dump_stack();
}
See attached for stack dump. Kernel version is 2.6.34.1 w/SCST
r1849
I think I've reached the end of my skill set for this particular
problem. If there is anything else I can do that would be
helpful, please let me know.
Thanks for the help,
Bryan
|
|
From: Vladislav B. <vs...@vl...> - 2010-07-26 12:22:48
|
Hi Bryan,
Bryan Mesich, on 07/23/2010 11:18 PM wrote:
> On Thu, Jul 15, 2010 at 03:40:29PM +0400, Vladislav Bolkhovitin wrote:
>
> Hi Vlad,
>
> [snip...]
>>> As a side note, would it be possible to patch a 2.6.32 kernel
>>> with a patch from the TRUNK, but compile a release version of
>>> SCST against the trunk patched kernel?
>>
>> Yes, sure. You can use scripts/generate-kernel-patch from the trunk for
>> that.
>
> Thanks for the clarification.
>
> [snip...]
>> It is very important to localize if update of the kernel or SCST is
>> responsible for the behavior change. So, I'd suggest you try the latest
>> SCST trunk (please update) with 2.6.27.42, which is known to work well.
>
> Well...I've localized to the problem to be kernel related. My
> testing puts the break point between 2.6.31.14 and 2.6.32. I
> attempted to bisect the kernel with git, but I'm running into
> problems getting SCST to compile against the bisected tree.
>
> In particular, I'm getting implicit declaration warnings for
> sync_page_range() in scst_vdisk.c (line 1927).
>
> #if LINUX_VERSION_CODE< KERNEL_VERSION(2, 6, 32)
> res = sync_page_range(file->f_dentry->d_inode, file->f_mapping,
> loff, len);
> #else
> #if 0 /* For sparse files we might need to sync metadata as well */
> res = generic_write_sync(file, loff, len);
> #else
> res = filemap_write_and_wait_range(file->f_mapping, loff, len);
> #endif
> #endif
>
> (BTW, I've been using SCST r1849 for my recent testing)
>
> I'm pushing my technical abilities here, but It would seem that
> sync_page_range() has been replaced for kernels> 2.6.32. I
> tried hacking around the compile warnings by commenting out
> sync_page_range() and forcing the use of
> filemap_write_and_wait_range(), but that resulted in a stack
> trace and kernel panic when trying to load SCST configuration
> (Modules loaded, opening VDISK failed).
>
>> Additionally, you should add in scst_vdisk.c::blockio_endio() just after
>> "PRINT_ERROR("cmd %p returned error %d", blockio_work->cmd, error);"
>> dump_stack() call. It will give us more info about the failed call stack.
>
> I put the dump_stack() where you suggested, but the resulting
> dump didn't seem helpful. I instead put the dump_stack() below
> the spinlock just after the above referenced "PRINT_ERROR". This
> resulted in a cleaner stack dump (see below).
>
>
> static DEFINE_SPINLOCK(blockio_endio_lock);
> static int error_count = 0;
>
> PRINT_ERROR("cmd %p returned error %d", blockio_work->cmd,
> error);
>
> /* To protect from several bios finishing simultaneously */
> spin_lock_bh(&blockio_endio_lock);
> if ( error_count++< 5 ) {
> printk("Error %d stack trace", error_count + 1);
> dump_stack();
> }
>
> See attached for stack dump. Kernel version is 2.6.34.1 w/SCST
> r1849
>
> I think I've reached the end of my skill set for this particular
> problem. If there is anything else I can do that would be
> helpful, please let me know.
You have made a very good progress! Now we know that (1) this is
regression in the kernel and (2), most likey, raid5.c::make_request()
for some reason sometimes calls bio_endio() with not BIO_UPTODATE bios.
Now we need to find out that reason.
I'd suggest you to finish the bisecting. You can simply comment out all
the sync_page_range() related code, for this investigation it doesn't
matter.
Also please in bio.c::bio_endio() change:
else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
error = -EIO;
to
else if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) {
printk("%s: not BIO_UPTODATE bio %p!\n", __func__, bio);
error = -EIO;
}
This will allow us to see if I correct in my analyze.
Vlad
|
|
From: Bryan M. <bry...@nd...> - 2010-07-27 22:01:18
|
Hi Vlad,
On Mon, Jul 26, 2010 at 04:22:13PM +0400, Vladislav Bolkhovitin wrote:
>
> You have made a very good progress! Now we know that (1) this is
> regression in the kernel and (2), most likey, raid5.c::make_request()
> for some reason sometimes calls bio_endio() with not BIO_UPTODATE bios.
> Now we need to find out that reason.
>
> I'd suggest you to finish the bisecting. You can simply comment out all
> the sync_page_range() related code, for this investigation it doesn't
> matter.
I finished bisecting the tree and found the following commit to
be the culprit.
commit a82afdfcb8c0df09776b6458af6b68fc58b2e87b
Author: Tejun Heo <tj...@ke...>
Date: Fri Jul 3 17:48:16 2009 +0900
block: use the same failfast bits for bio and request
bio and request use the same set of failfast bits. This patch makes
the following changes to simplify things.
* enumify BIO_RW* bits and reorder bits such that BIOS_RW_FAILFAST_*
bits coincide with __REQ_FAILFAST_* bits.
* The above pushes BIO_RW_AHEAD out of sync with __REQ_FAILFAST_DEV
but the matching is useless anyway. init_request_from_bio() is
responsible for setting FAILFAST bits on FS requests and non-FS
requests never use BIO_RW_AHEAD. Drop the code and comment from
blk_rq_bio_prep().
* Define REQ_FAILFAST_MASK which is OR of all FAILFAST bits and
simplify FAILFAST flags handling in init_request_from_bio().
Signed-off-by: Tejun Heo <tj...@ke...>
Signed-off-by: Jens Axboe <jen...@or...>
> Also please in bio.c::bio_endio() change:
>
> else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
> error = -EIO;
>
> to
>
> else if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) {
> printk("%s: not BIO_UPTODATE bio %p!\n", __func__, bio);
> error = -EIO;
> }
>
> This will allow us to see if I correct in my analyze.
I did not make the above modification as I was unsure if it was
needed after tracking down the regression. If it would be
helpful, I can still do it.
Thanks for the help,
Bryan
|
|
From: Vladislav B. <vs...@vl...> - 2010-07-28 18:16:42
|
Hello,
In recent kernels we are experiencing a problem that in our setup using SCST BLOCKIO backend some BIOs are finished, i.e. the finish callback called for them, with error -EIO. It happens quite often, much more often than one would expect to have an actual IO error. (BLOCKIO backend just converts all incoming SCSI commands to the corresponding block requests.)
After some investigation, we figured out, that, most likely, raid5.c::make_request() for some reason sometimes calls bio_endio() with not BIO_UPTODATE bios.
We bisected it to commit:
commit a82afdfcb8c0df09776b6458af6b68fc58b2e87b
Author: Tejun Heo <tj...@ke...>
Date: Fri Jul 3 17:48:16 2009 +0900
block: use the same failfast bits for bio and request
bio and request use the same set of failfast bits. This patch makes
the following changes to simplify things.
* enumify BIO_RW* bits and reorder bits such that BIOS_RW_FAILFAST_*
bits coincide with __REQ_FAILFAST_* bits.
* The above pushes BIO_RW_AHEAD out of sync with __REQ_FAILFAST_DEV
but the matching is useless anyway. init_request_from_bio() is
responsible for setting FAILFAST bits on FS requests and non-FS
requests never use BIO_RW_AHEAD. Drop the code and comment from
blk_rq_bio_prep().
* Define REQ_FAILFAST_MASK which is OR of all FAILFAST bits and
simplify FAILFAST flags handling in init_request_from_bio().
Signed-off-by: Tejun Heo <tj...@ke...>
Signed-off-by: Jens Axboe <jen...@or...>
After looking at it I can't see how it can lead to the effect we are experiencing. Could anybody comment on this, please? Is it a known problem?
The error can be only reproduced when running RAID 5. The general layout is:
Disks --> RAID5 --> LVM --> BLOCKIO VDISK
The problem is easy to reproduce by forcing the RAID 5 array to re-sync its members, eg just fail out one member and add it back into the array and then generate some IO using dd. In fact, just writing out to the partition table on the exported block device is usually enough to provoke the error.
The complete thread about the topic you can find in http://sourceforge.net/mailarchive/forum.php?thread_name=20100727220110.GF31152%40atlantis.cc.ndsu.nodak.edu&forum_name=scst-devel
If any additional information is needed we would be glad to provide it.
Thanks,
Vlad
|
|
From: Tejun H. <tj...@ke...> - 2010-07-30 11:02:58
|
Hello, On 07/28/2010 08:16 PM, Vladislav Bolkhovitin wrote: > In recent kernels we are experiencing a problem that in our setup > using SCST BLOCKIO backend some BIOs are finished, i.e. the finish > callback called for them, with error -EIO. It happens quite often, > much more often than one would expect to have an actual IO > error. (BLOCKIO backend just converts all incoming SCSI commands to > the corresponding block requests.) > > After some investigation, we figured out, that, most likely, > raid5.c::make_request() for some reason sometimes calls bio_endio() > with not BIO_UPTODATE bios. > > We bisected it to commit: > > commit a82afdfcb8c0df09776b6458af6b68fc58b2e87b > Author: Tejun Heo <tj...@ke...> > Date: Fri Jul 3 17:48:16 2009 +0900 > > block: use the same failfast bits for bio and request That commit doesn't (or at least isn't supposed to) make any behavior difference. It's just repositioning flag bits. If the commit is actually causing the problem, I think one possibility is that whatever code could be using hard coded constants which now are mapped to different flags. The mixed merge changes have been in mainline for quite some time and shipping in all major distros too and this is the first time this is reported, so I don't think it could be a widespread problem. Thanks. -- tejun |
|
From: Neil B. <ne...@su...> - 2010-08-02 00:42:46
|
On Fri, 30 Jul 2010 12:29:30 +0200
Tejun Heo <tj...@ke...> wrote:
> Hello,
>
> On 07/28/2010 08:16 PM, Vladislav Bolkhovitin wrote:
> > In recent kernels we are experiencing a problem that in our setup
> > using SCST BLOCKIO backend some BIOs are finished, i.e. the finish
> > callback called for them, with error -EIO. It happens quite often,
> > much more often than one would expect to have an actual IO
> > error. (BLOCKIO backend just converts all incoming SCSI commands to
> > the corresponding block requests.)
> >
> > After some investigation, we figured out, that, most likely,
> > raid5.c::make_request() for some reason sometimes calls bio_endio()
> > with not BIO_UPTODATE bios.
> >
> > We bisected it to commit:
> >
> > commit a82afdfcb8c0df09776b6458af6b68fc58b2e87b
> > Author: Tejun Heo <tj...@ke...>
> > Date: Fri Jul 3 17:48:16 2009 +0900
> >
> > block: use the same failfast bits for bio and request
>
> That commit doesn't (or at least isn't supposed to) make any behavior
> difference. It's just repositioning flag bits. If the commit is
> actually causing the problem, I think one possibility is that whatever
> code could be using hard coded constants which now are mapped to
> different flags. The mixed merge changes have been in mainline for
> quite some time and shipping in all major distros too and this is the
> first time this is reported, so I don't think it could be a widespread
> problem.
>
> Thanks.
>
The problem is that md/raid5 tests bio->bi_rw against RWA_MASK, which used to
align with BIO_RW_AHEAD, and now doesn't.
However the definition of bio_rw() in fs.h seems to justify that RWA_MASK
should align with BIO_RW_AHEAD, as does the definition of READA.
Given the current definitions, any WRITE request with BIO_RW_FAILFAST_DEV
set is going to confused a number of drives which test
bio_rw(bio) == WRITE
I guess RWA_MASK needs to be changed to (1<<BIO_RW_AHEAD), and READA need to
be change to that value too.
Can I leave that to you Tejun?
Thanks,
NeilBrown
|
|
From: Tejun H. <tj...@ke...> - 2010-08-02 14:12:32
|
Commit a82afdf (block: use the same failfast bits for bio and request)
moved BIO_RW_* bits around such that they match up with REQ_* bits.
Unfortunately, fs.h hard coded READ, WRITE, READA and SWRITE as 0, 1,
2 and 3, and expected them to match with BIO_RW_* bits. READ/WRITE
didn't change but BIO_RW_AHEAD was moved to bit 4 instead of bit 1,
breaking READA and SWRITE.
This patch updates READA and SWRITE such that they match the BIO_RW_*
bits again. A follow up patch will update the definitions to directly
use BIO_RW_* bits so that this kind of breakage won't happen again.
Stable: The offending commit a82afdf was released with v2.6.32, so
this patch should be applied to all kernels since then but it must
_NOT_ be applied to kernels earlier than that.
Signed-off-by: Tejun Heo <tj...@ke...>
Reported-and-bisected-by: Vladislav Bolkhovitin <vs...@vl...>
Root-caused-by: Neil Brown <ne...@su...>
Cc: Jens Axobe <ax...@ke...>
Cc: st...@ke...
---
Aieee... thanks for root causing it Neil. That was a stupid bug. I
knew that READ/WRITE were hardcoded but forgot about READA. :-(
Moving BIO_RW_AHEAD back to bit 1 might be a better solution but I'm
afraid that would cause more confusions downstream. This patch
updates READA and SWRITE to match BIO_RW_AHEAD and should also appear
in -stable releases. The next patch will create bio_types.h and
define all constants in terms of BIO_RW_*.
Thanks.
include/linux/fs.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: work/include/linux/fs.h
===================================================================
--- work.orig/include/linux/fs.h
+++ work/include/linux/fs.h
@@ -148,8 +148,8 @@ struct inodes_stat_t {
#define RWA_MASK 2
#define READ 0
#define WRITE 1
-#define READA 2 /* read-ahead - don't block if no resources */
-#define SWRITE 3 /* for ll_rw_block() - wait for buffer lock */
+#define READA 16 /* read-ahead - don't block if no resources */
+#define SWRITE 17 /* for ll_rw_block() - wait for buffer lock */
#define READ_SYNC (READ | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
#define READ_META (READ | (1 << BIO_RW_META))
#define WRITE_SYNC_PLUG (WRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))
|
|
From: Tejun H. <tj...@ke...> - 2010-08-02 14:13:13
|
linux/fs.h hard coded READ/WRITE constants which should match BIO_RW_*
flags. This is fragile and caused breakage during BIO_RW_* flag
rearrangement. The hardcoding is to avoid include dependency hell.
Create linux/bio_types.h which contatins definitions for bio data
structures and flags and include it from bio.h and fs.h, and make fs.h
define all READ/WRITE related constants in terms of BIO_RW_* flags.
Signed-off-by: Tejun Heo <tj...@ke...>
Cc: Jens Axobe <ax...@ke...>
---
include/linux/bio.h | 153 +-----------------------------------------
include/linux/bio_types.h | 164 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 17 ++--
3 files changed, 176 insertions(+), 158 deletions(-)
Index: work/include/linux/bio.h
===================================================================
--- work.orig/include/linux/bio.h
+++ work/include/linux/bio.h
@@ -9,7 +9,7 @@
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
-
+ *
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
@@ -28,6 +28,9 @@
#include <asm/io.h>
+/* struct bio, bio_vec and BIO_* flags are defined in bio_types.h */
+#include <linux/bio_types.h>
+
#define BIO_DEBUG
#ifdef BIO_DEBUG
@@ -41,154 +44,6 @@
#define BIO_MAX_SECTORS (BIO_MAX_SIZE >> 9)
/*
- * was unsigned short, but we might as well be ready for > 64kB I/O pages
- */
-struct bio_vec {
- struct page *bv_page;
- unsigned int bv_len;
- unsigned int bv_offset;
-};
-
-struct bio_set;
-struct bio;
-struct bio_integrity_payload;
-typedef void (bio_end_io_t) (struct bio *, int);
-typedef void (bio_destructor_t) (struct bio *);
-
-/*
- * main unit of I/O for the block layer and lower layers (ie drivers and
- * stacking drivers)
- */
-struct bio {
- sector_t bi_sector; /* device address in 512 byte
- sectors */
- struct bio *bi_next; /* request queue link */
- struct block_device *bi_bdev;
- unsigned long bi_flags; /* status, command, etc */
- unsigned long bi_rw; /* bottom bits READ/WRITE,
- * top bits priority
- */
-
- unsigned short bi_vcnt; /* how many bio_vec's */
- unsigned short bi_idx; /* current index into bvl_vec */
-
- /* Number of segments in this BIO after
- * physical address coalescing is performed.
- */
- unsigned int bi_phys_segments;
-
- unsigned int bi_size; /* residual I/O count */
-
- /*
- * To keep track of the max segment size, we account for the
- * sizes of the first and last mergeable segments in this bio.
- */
- unsigned int bi_seg_front_size;
- unsigned int bi_seg_back_size;
-
- unsigned int bi_max_vecs; /* max bvl_vecs we can hold */
-
- unsigned int bi_comp_cpu; /* completion CPU */
-
- atomic_t bi_cnt; /* pin count */
-
- struct bio_vec *bi_io_vec; /* the actual vec list */
-
- bio_end_io_t *bi_end_io;
-
- void *bi_private;
-#if defined(CONFIG_BLK_DEV_INTEGRITY)
- struct bio_integrity_payload *bi_integrity; /* data integrity */
-#endif
-
- bio_destructor_t *bi_destructor; /* destructor */
-
- /*
- * We can inline a number of vecs at the end of the bio, to avoid
- * double allocations for a small number of bio_vecs. This member
- * MUST obviously be kept at the very end of the bio.
- */
- struct bio_vec bi_inline_vecs[0];
-};
-
-/*
- * bio flags
- */
-#define BIO_UPTODATE 0 /* ok after I/O completion */
-#define BIO_RW_BLOCK 1 /* RW_AHEAD set, and read/write would block */
-#define BIO_EOF 2 /* out-out-bounds error */
-#define BIO_SEG_VALID 3 /* bi_phys_segments valid */
-#define BIO_CLONED 4 /* doesn't own data */
-#define BIO_BOUNCED 5 /* bio is a bounce bio */
-#define BIO_USER_MAPPED 6 /* contains user pages */
-#define BIO_EOPNOTSUPP 7 /* not supported */
-#define BIO_CPU_AFFINE 8 /* complete bio on same CPU as submitted */
-#define BIO_NULL_MAPPED 9 /* contains invalid user pages */
-#define BIO_FS_INTEGRITY 10 /* fs owns integrity data, not block layer */
-#define BIO_QUIET 11 /* Make BIO Quiet */
-#define bio_flagged(bio, flag) ((bio)->bi_flags & (1 << (flag)))
-
-/*
- * top 4 bits of bio flags indicate the pool this bio came from
- */
-#define BIO_POOL_BITS (4)
-#define BIO_POOL_NONE ((1UL << BIO_POOL_BITS) - 1)
-#define BIO_POOL_OFFSET (BITS_PER_LONG - BIO_POOL_BITS)
-#define BIO_POOL_MASK (1UL << BIO_POOL_OFFSET)
-#define BIO_POOL_IDX(bio) ((bio)->bi_flags >> BIO_POOL_OFFSET)
-
-/*
- * bio bi_rw flags
- *
- * bit 0 -- data direction
- * If not set, bio is a read from device. If set, it's a write to device.
- * bit 1 -- fail fast device errors
- * bit 2 -- fail fast transport errors
- * bit 3 -- fail fast driver errors
- * bit 4 -- rw-ahead when set
- * bit 5 -- barrier
- * Insert a serialization point in the IO queue, forcing previously
- * submitted IO to be completed before this one is issued.
- * bit 6 -- synchronous I/O hint.
- * bit 7 -- Unplug the device immediately after submitting this bio.
- * bit 8 -- metadata request
- * Used for tracing to differentiate metadata and data IO. May also
- * get some preferential treatment in the IO scheduler
- * bit 9 -- discard sectors
- * Informs the lower level device that this range of sectors is no longer
- * used by the file system and may thus be freed by the device. Used
- * for flash based storage.
- * Don't want driver retries for any fast fail whatever the reason.
- * bit 10 -- Tell the IO scheduler not to wait for more requests after this
- one has been submitted, even if it is a SYNC request.
- */
-enum bio_rw_flags {
- BIO_RW,
- BIO_RW_FAILFAST_DEV,
- BIO_RW_FAILFAST_TRANSPORT,
- BIO_RW_FAILFAST_DRIVER,
- /* above flags must match REQ_* */
- BIO_RW_AHEAD,
- BIO_RW_BARRIER,
- BIO_RW_SYNCIO,
- BIO_RW_UNPLUG,
- BIO_RW_META,
- BIO_RW_DISCARD,
- BIO_RW_NOIDLE,
-};
-
-/*
- * First four bits must match between bio->bi_rw and rq->cmd_flags, make
- * that explicit here.
- */
-#define BIO_RW_RQ_MASK 0xf
-
-static inline bool bio_rw_flagged(struct bio *bio, enum bio_rw_flags flag)
-{
- return (bio->bi_rw & (1 << flag)) != 0;
-}
-
-/*
* upper 16 bits of bi_rw define the io priority of this bio
*/
#define BIO_PRIO_SHIFT (8 * sizeof(unsigned long) - IOPRIO_BITS)
Index: work/include/linux/bio_types.h
===================================================================
--- /dev/null
+++ work/include/linux/bio_types.h
@@ -0,0 +1,164 @@
+/*
+ * BIO data types and constants. Include linux/bio.h for usual cases.
+ * Directly include this file only to break include dependency loop.
+ */
+#ifndef __LINUX_BIO_TYPES_H
+#define __LINUX_BIO_TYPES_H
+
+#ifdef CONFIG_BLOCK
+
+#include <linux/types.h>
+
+struct bio_set;
+struct bio;
+struct bio_integrity_payload;
+struct page;
+struct block_device;
+
+/*
+ * was unsigned short, but we might as well be ready for > 64kB I/O pages
+ */
+struct bio_vec {
+ struct page *bv_page;
+ unsigned int bv_len;
+ unsigned int bv_offset;
+};
+
+typedef void (bio_end_io_t) (struct bio *, int);
+typedef void (bio_destructor_t) (struct bio *);
+
+/*
+ * main unit of I/O for the block layer and lower layers (ie drivers and
+ * stacking drivers)
+ */
+struct bio {
+ sector_t bi_sector; /* device address in 512 byte
+ sectors */
+ struct bio *bi_next; /* request queue link */
+ struct block_device *bi_bdev;
+ unsigned long bi_flags; /* status, command, etc */
+ unsigned long bi_rw; /* bottom bits READ/WRITE,
+ * top bits priority
+ */
+
+ unsigned short bi_vcnt; /* how many bio_vec's */
+ unsigned short bi_idx; /* current index into bvl_vec */
+
+ /* Number of segments in this BIO after
+ * physical address coalescing is performed.
+ */
+ unsigned int bi_phys_segments;
+
+ unsigned int bi_size; /* residual I/O count */
+
+ /*
+ * To keep track of the max segment size, we account for the
+ * sizes of the first and last mergeable segments in this bio.
+ */
+ unsigned int bi_seg_front_size;
+ unsigned int bi_seg_back_size;
+
+ unsigned int bi_max_vecs; /* max bvl_vecs we can hold */
+
+ unsigned int bi_comp_cpu; /* completion CPU */
+
+ atomic_t bi_cnt; /* pin count */
+
+ struct bio_vec *bi_io_vec; /* the actual vec list */
+
+ bio_end_io_t *bi_end_io;
+
+ void *bi_private;
+#if defined(CONFIG_BLK_DEV_INTEGRITY)
+ struct bio_integrity_payload *bi_integrity; /* data integrity */
+#endif
+
+ bio_destructor_t *bi_destructor; /* destructor */
+
+ /*
+ * We can inline a number of vecs at the end of the bio, to avoid
+ * double allocations for a small number of bio_vecs. This member
+ * MUST obviously be kept at the very end of the bio.
+ */
+ struct bio_vec bi_inline_vecs[0];
+};
+
+/*
+ * bio flags
+ */
+#define BIO_UPTODATE 0 /* ok after I/O completion */
+#define BIO_RW_BLOCK 1 /* RW_AHEAD set, and read/write would block */
+#define BIO_EOF 2 /* out-out-bounds error */
+#define BIO_SEG_VALID 3 /* bi_phys_segments valid */
+#define BIO_CLONED 4 /* doesn't own data */
+#define BIO_BOUNCED 5 /* bio is a bounce bio */
+#define BIO_USER_MAPPED 6 /* contains user pages */
+#define BIO_EOPNOTSUPP 7 /* not supported */
+#define BIO_CPU_AFFINE 8 /* complete bio on same CPU as submitted */
+#define BIO_NULL_MAPPED 9 /* contains invalid user pages */
+#define BIO_FS_INTEGRITY 10 /* fs owns integrity data, not block layer */
+#define BIO_QUIET 11 /* Make BIO Quiet */
+#define bio_flagged(bio, flag) ((bio)->bi_flags & (1 << (flag)))
+
+/*
+ * top 4 bits of bio flags indicate the pool this bio came from
+ */
+#define BIO_POOL_BITS (4)
+#define BIO_POOL_NONE ((1UL << BIO_POOL_BITS) - 1)
+#define BIO_POOL_OFFSET (BITS_PER_LONG - BIO_POOL_BITS)
+#define BIO_POOL_MASK (1UL << BIO_POOL_OFFSET)
+#define BIO_POOL_IDX(bio) ((bio)->bi_flags >> BIO_POOL_OFFSET)
+
+/*
+ * bio bi_rw flags
+ *
+ * bit 0 -- data direction
+ * If not set, bio is a read from device. If set, it's a write to device.
+ * bit 1 -- fail fast device errors
+ * bit 2 -- fail fast transport errors
+ * bit 3 -- fail fast driver errors
+ * bit 4 -- rw-ahead when set
+ * bit 5 -- barrier
+ * Insert a serialization point in the IO queue, forcing previously
+ * submitted IO to be completed before this one is issued.
+ * bit 6 -- synchronous I/O hint.
+ * bit 7 -- Unplug the device immediately after submitting this bio.
+ * bit 8 -- metadata request
+ * Used for tracing to differentiate metadata and data IO. May also
+ * get some preferential treatment in the IO scheduler
+ * bit 9 -- discard sectors
+ * Informs the lower level device that this range of sectors is no longer
+ * used by the file system and may thus be freed by the device. Used
+ * for flash based storage.
+ * Don't want driver retries for any fast fail whatever the reason.
+ * bit 10 -- Tell the IO scheduler not to wait for more requests after this
+ one has been submitted, even if it is a SYNC request.
+ */
+enum bio_rw_flags {
+ BIO_RW,
+ BIO_RW_FAILFAST_DEV,
+ BIO_RW_FAILFAST_TRANSPORT,
+ BIO_RW_FAILFAST_DRIVER,
+ /* above flags must match REQ_* */
+ BIO_RW_AHEAD,
+ BIO_RW_BARRIER,
+ BIO_RW_SYNCIO,
+ BIO_RW_UNPLUG,
+ BIO_RW_META,
+ BIO_RW_DISCARD,
+ BIO_RW_NOIDLE,
+};
+
+/*
+ * First four bits must match between bio->bi_rw and rq->cmd_flags, make
+ * that explicit here.
+ */
+#define BIO_RW_RQ_MASK 0xf
+
+static inline bool bio_rw_flagged(struct bio *bio, enum bio_rw_flags flag)
+{
+ return (bio->bi_rw & (1 << flag)) != 0;
+}
+
+#endif /* CONFIG_BLOCK */
+#endif /* __LINUX_BIO_TYPES_H */
Index: work/include/linux/fs.h
===================================================================
--- work.orig/include/linux/fs.h
+++ work/include/linux/fs.h
@@ -8,6 +8,7 @@
#include <linux/limits.h>
#include <linux/ioctl.h>
+#include <linux/bio_types.h>
/*
* It's silly to have NR_OPEN bigger than NR_FILE, but you can change
@@ -117,7 +118,7 @@ struct inodes_stat_t {
* immediately wait on this read without caring about
* unplugging.
* READA Used for read-ahead operations. Lower priority, and the
- * block layer could (in theory) choose to ignore this
+ * block layer could (in theory) choose to ignore this
* request if it runs into resource problems.
* WRITE A normal async write. Device will be plugged.
* SWRITE Like WRITE, but a special case for ll_rw_block() that
@@ -144,12 +145,12 @@ struct inodes_stat_t {
* of this IO.
*
*/
-#define RW_MASK 1
-#define RWA_MASK 2
-#define READ 0
-#define WRITE 1
-#define READA 16 /* read-ahead - don't block if no resources */
-#define SWRITE 17 /* for ll_rw_block() - wait for buffer lock */
+#define RW_MASK (1 << BIO_RW)
+#define RWA_MASK (1 << BIO_RW_AHEAD)
+#define READ 0
+#define WRITE RW_MASK
+#define READA RWA_MASK
+#define SWRITE (WRITE | READA)
#define READ_SYNC (READ | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
#define READ_META (READ | (1 << BIO_RW_META))
#define WRITE_SYNC_PLUG (WRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))
@@ -2198,7 +2199,6 @@ static inline void insert_inode_hash(str
extern void file_move(struct file *f, struct list_head *list);
extern void file_kill(struct file *f);
#ifdef CONFIG_BLOCK
-struct bio;
extern void submit_bio(int, struct bio *);
extern int bdev_read_only(struct block_device *);
#endif
@@ -2265,7 +2265,6 @@ static inline int xip_truncate_page(stru
#endif
#ifdef CONFIG_BLOCK
-struct bio;
typedef void (dio_submit_t)(int rw, struct bio *bio, struct inode *inode,
loff_t file_offset);
void dio_end_io(struct bio *bio, int error);
|
|
From: Tejun H. <tj...@ke...> - 2010-08-02 14:15:52
|
linux/fs.h hard coded READ/WRITE constants which should match BIO_RW_*
flags. This is fragile and caused breakage during BIO_RW_* flag
rearrangement. The hardcoding is to avoid include dependency hell.
Create linux/bio_types.h which contatins definitions for bio data
structures and flags and include it from bio.h and fs.h, and make fs.h
define all READ/WRITE related constants in terms of BIO_RW_* flags.
Signed-off-by: Tejun Heo <tj...@ke...>
Cc: Jens Axobe <ax...@ke...>
---
include/linux/bio.h | 153 +-----------------------------------------
include/linux/bio_types.h | 164 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 17 ++--
3 files changed, 176 insertions(+), 158 deletions(-)
Index: work/include/linux/bio.h
===================================================================
--- work.orig/include/linux/bio.h
+++ work/include/linux/bio.h
@@ -9,7 +9,7 @@
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
-
+ *
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
@@ -28,6 +28,9 @@
#include <asm/io.h>
+/* struct bio, bio_vec and BIO_* flags are defined in bio_types.h */
+#include <linux/bio_types.h>
+
#define BIO_DEBUG
#ifdef BIO_DEBUG
@@ -41,154 +44,6 @@
#define BIO_MAX_SECTORS (BIO_MAX_SIZE >> 9)
/*
- * was unsigned short, but we might as well be ready for > 64kB I/O pages
- */
-struct bio_vec {
- struct page *bv_page;
- unsigned int bv_len;
- unsigned int bv_offset;
-};
-
-struct bio_set;
-struct bio;
-struct bio_integrity_payload;
-typedef void (bio_end_io_t) (struct bio *, int);
-typedef void (bio_destructor_t) (struct bio *);
-
-/*
- * main unit of I/O for the block layer and lower layers (ie drivers and
- * stacking drivers)
- */
-struct bio {
- sector_t bi_sector; /* device address in 512 byte
- sectors */
- struct bio *bi_next; /* request queue link */
- struct block_device *bi_bdev;
- unsigned long bi_flags; /* status, command, etc */
- unsigned long bi_rw; /* bottom bits READ/WRITE,
- * top bits priority
- */
-
- unsigned short bi_vcnt; /* how many bio_vec's */
- unsigned short bi_idx; /* current index into bvl_vec */
-
- /* Number of segments in this BIO after
- * physical address coalescing is performed.
- */
- unsigned int bi_phys_segments;
-
- unsigned int bi_size; /* residual I/O count */
-
- /*
- * To keep track of the max segment size, we account for the
- * sizes of the first and last mergeable segments in this bio.
- */
- unsigned int bi_seg_front_size;
- unsigned int bi_seg_back_size;
-
- unsigned int bi_max_vecs; /* max bvl_vecs we can hold */
-
- unsigned int bi_comp_cpu; /* completion CPU */
-
- atomic_t bi_cnt; /* pin count */
-
- struct bio_vec *bi_io_vec; /* the actual vec list */
-
- bio_end_io_t *bi_end_io;
-
- void *bi_private;
-#if defined(CONFIG_BLK_DEV_INTEGRITY)
- struct bio_integrity_payload *bi_integrity; /* data integrity */
-#endif
-
- bio_destructor_t *bi_destructor; /* destructor */
-
- /*
- * We can inline a number of vecs at the end of the bio, to avoid
- * double allocations for a small number of bio_vecs. This member
- * MUST obviously be kept at the very end of the bio.
- */
- struct bio_vec bi_inline_vecs[0];
-};
-
-/*
- * bio flags
- */
-#define BIO_UPTODATE 0 /* ok after I/O completion */
-#define BIO_RW_BLOCK 1 /* RW_AHEAD set, and read/write would block */
-#define BIO_EOF 2 /* out-out-bounds error */
-#define BIO_SEG_VALID 3 /* bi_phys_segments valid */
-#define BIO_CLONED 4 /* doesn't own data */
-#define BIO_BOUNCED 5 /* bio is a bounce bio */
-#define BIO_USER_MAPPED 6 /* contains user pages */
-#define BIO_EOPNOTSUPP 7 /* not supported */
-#define BIO_CPU_AFFINE 8 /* complete bio on same CPU as submitted */
-#define BIO_NULL_MAPPED 9 /* contains invalid user pages */
-#define BIO_FS_INTEGRITY 10 /* fs owns integrity data, not block layer */
-#define BIO_QUIET 11 /* Make BIO Quiet */
-#define bio_flagged(bio, flag) ((bio)->bi_flags & (1 << (flag)))
-
-/*
- * top 4 bits of bio flags indicate the pool this bio came from
- */
-#define BIO_POOL_BITS (4)
-#define BIO_POOL_NONE ((1UL << BIO_POOL_BITS) - 1)
-#define BIO_POOL_OFFSET (BITS_PER_LONG - BIO_POOL_BITS)
-#define BIO_POOL_MASK (1UL << BIO_POOL_OFFSET)
-#define BIO_POOL_IDX(bio) ((bio)->bi_flags >> BIO_POOL_OFFSET)
-
-/*
- * bio bi_rw flags
- *
- * bit 0 -- data direction
- * If not set, bio is a read from device. If set, it's a write to device.
- * bit 1 -- fail fast device errors
- * bit 2 -- fail fast transport errors
- * bit 3 -- fail fast driver errors
- * bit 4 -- rw-ahead when set
- * bit 5 -- barrier
- * Insert a serialization point in the IO queue, forcing previously
- * submitted IO to be completed before this one is issued.
- * bit 6 -- synchronous I/O hint.
- * bit 7 -- Unplug the device immediately after submitting this bio.
- * bit 8 -- metadata request
- * Used for tracing to differentiate metadata and data IO. May also
- * get some preferential treatment in the IO scheduler
- * bit 9 -- discard sectors
- * Informs the lower level device that this range of sectors is no longer
- * used by the file system and may thus be freed by the device. Used
- * for flash based storage.
- * Don't want driver retries for any fast fail whatever the reason.
- * bit 10 -- Tell the IO scheduler not to wait for more requests after this
- one has been submitted, even if it is a SYNC request.
- */
-enum bio_rw_flags {
- BIO_RW,
- BIO_RW_FAILFAST_DEV,
- BIO_RW_FAILFAST_TRANSPORT,
- BIO_RW_FAILFAST_DRIVER,
- /* above flags must match REQ_* */
- BIO_RW_AHEAD,
- BIO_RW_BARRIER,
- BIO_RW_SYNCIO,
- BIO_RW_UNPLUG,
- BIO_RW_META,
- BIO_RW_DISCARD,
- BIO_RW_NOIDLE,
-};
-
-/*
- * First four bits must match between bio->bi_rw and rq->cmd_flags, make
- * that explicit here.
- */
-#define BIO_RW_RQ_MASK 0xf
-
-static inline bool bio_rw_flagged(struct bio *bio, enum bio_rw_flags flag)
-{
- return (bio->bi_rw & (1 << flag)) != 0;
-}
-
-/*
* upper 16 bits of bi_rw define the io priority of this bio
*/
#define BIO_PRIO_SHIFT (8 * sizeof(unsigned long) - IOPRIO_BITS)
Index: work/include/linux/bio_types.h
===================================================================
--- /dev/null
+++ work/include/linux/bio_types.h
@@ -0,0 +1,164 @@
+/*
+ * BIO data types and constants. Include linux/bio.h for usual cases.
+ * Directly include this file only to break include dependency loop.
+ */
+#ifndef __LINUX_BIO_TYPES_H
+#define __LINUX_BIO_TYPES_H
+
+#ifdef CONFIG_BLOCK
+
+#include <linux/types.h>
+
+struct bio_set;
+struct bio;
+struct bio_integrity_payload;
+struct page;
+struct block_device;
+
+/*
+ * was unsigned short, but we might as well be ready for > 64kB I/O pages
+ */
+struct bio_vec {
+ struct page *bv_page;
+ unsigned int bv_len;
+ unsigned int bv_offset;
+};
+
+typedef void (bio_end_io_t) (struct bio *, int);
+typedef void (bio_destructor_t) (struct bio *);
+
+/*
+ * main unit of I/O for the block layer and lower layers (ie drivers and
+ * stacking drivers)
+ */
+struct bio {
+ sector_t bi_sector; /* device address in 512 byte
+ sectors */
+ struct bio *bi_next; /* request queue link */
+ struct block_device *bi_bdev;
+ unsigned long bi_flags; /* status, command, etc */
+ unsigned long bi_rw; /* bottom bits READ/WRITE,
+ * top bits priority
+ */
+
+ unsigned short bi_vcnt; /* how many bio_vec's */
+ unsigned short bi_idx; /* current index into bvl_vec */
+
+ /* Number of segments in this BIO after
+ * physical address coalescing is performed.
+ */
+ unsigned int bi_phys_segments;
+
+ unsigned int bi_size; /* residual I/O count */
+
+ /*
+ * To keep track of the max segment size, we account for the
+ * sizes of the first and last mergeable segments in this bio.
+ */
+ unsigned int bi_seg_front_size;
+ unsigned int bi_seg_back_size;
+
+ unsigned int bi_max_vecs; /* max bvl_vecs we can hold */
+
+ unsigned int bi_comp_cpu; /* completion CPU */
+
+ atomic_t bi_cnt; /* pin count */
+
+ struct bio_vec *bi_io_vec; /* the actual vec list */
+
+ bio_end_io_t *bi_end_io;
+
+ void *bi_private;
+#if defined(CONFIG_BLK_DEV_INTEGRITY)
+ struct bio_integrity_payload *bi_integrity; /* data integrity */
+#endif
+
+ bio_destructor_t *bi_destructor; /* destructor */
+
+ /*
+ * We can inline a number of vecs at the end of the bio, to avoid
+ * double allocations for a small number of bio_vecs. This member
+ * MUST obviously be kept at the very end of the bio.
+ */
+ struct bio_vec bi_inline_vecs[0];
+};
+
+/*
+ * bio flags
+ */
+#define BIO_UPTODATE 0 /* ok after I/O completion */
+#define BIO_RW_BLOCK 1 /* RW_AHEAD set, and read/write would block */
+#define BIO_EOF 2 /* out-out-bounds error */
+#define BIO_SEG_VALID 3 /* bi_phys_segments valid */
+#define BIO_CLONED 4 /* doesn't own data */
+#define BIO_BOUNCED 5 /* bio is a bounce bio */
+#define BIO_USER_MAPPED 6 /* contains user pages */
+#define BIO_EOPNOTSUPP 7 /* not supported */
+#define BIO_CPU_AFFINE 8 /* complete bio on same CPU as submitted */
+#define BIO_NULL_MAPPED 9 /* contains invalid user pages */
+#define BIO_FS_INTEGRITY 10 /* fs owns integrity data, not block layer */
+#define BIO_QUIET 11 /* Make BIO Quiet */
+#define bio_flagged(bio, flag) ((bio)->bi_flags & (1 << (flag)))
+
+/*
+ * top 4 bits of bio flags indicate the pool this bio came from
+ */
+#define BIO_POOL_BITS (4)
+#define BIO_POOL_NONE ((1UL << BIO_POOL_BITS) - 1)
+#define BIO_POOL_OFFSET (BITS_PER_LONG - BIO_POOL_BITS)
+#define BIO_POOL_MASK (1UL << BIO_POOL_OFFSET)
+#define BIO_POOL_IDX(bio) ((bio)->bi_flags >> BIO_POOL_OFFSET)
+
+/*
+ * bio bi_rw flags
+ *
+ * bit 0 -- data direction
+ * If not set, bio is a read from device. If set, it's a write to device.
+ * bit 1 -- fail fast device errors
+ * bit 2 -- fail fast transport errors
+ * bit 3 -- fail fast driver errors
+ * bit 4 -- rw-ahead when set
+ * bit 5 -- barrier
+ * Insert a serialization point in the IO queue, forcing previously
+ * submitted IO to be completed before this one is issued.
+ * bit 6 -- synchronous I/O hint.
+ * bit 7 -- Unplug the device immediately after submitting this bio.
+ * bit 8 -- metadata request
+ * Used for tracing to differentiate metadata and data IO. May also
+ * get some preferential treatment in the IO scheduler
+ * bit 9 -- discard sectors
+ * Informs the lower level device that this range of sectors is no longer
+ * used by the file system and may thus be freed by the device. Used
+ * for flash based storage.
+ * Don't want driver retries for any fast fail whatever the reason.
+ * bit 10 -- Tell the IO scheduler not to wait for more requests after this
+ one has been submitted, even if it is a SYNC request.
+ */
+enum bio_rw_flags {
+ BIO_RW,
+ BIO_RW_FAILFAST_DEV,
+ BIO_RW_FAILFAST_TRANSPORT,
+ BIO_RW_FAILFAST_DRIVER,
+ /* above flags must match REQ_* */
+ BIO_RW_AHEAD,
+ BIO_RW_BARRIER,
+ BIO_RW_SYNCIO,
+ BIO_RW_UNPLUG,
+ BIO_RW_META,
+ BIO_RW_DISCARD,
+ BIO_RW_NOIDLE,
+};
+
+/*
+ * First four bits must match between bio->bi_rw and rq->cmd_flags, make
+ * that explicit here.
+ */
+#define BIO_RW_RQ_MASK 0xf
+
+static inline bool bio_rw_flagged(struct bio *bio, enum bio_rw_flags flag)
+{
+ return (bio->bi_rw & (1 << flag)) != 0;
+}
+
+#endif /* CONFIG_BLOCK */
+#endif /* __LINUX_BIO_TYPES_H */
Index: work/include/linux/fs.h
===================================================================
--- work.orig/include/linux/fs.h
+++ work/include/linux/fs.h
@@ -8,6 +8,7 @@
#include <linux/limits.h>
#include <linux/ioctl.h>
+#include <linux/bio_types.h>
/*
* It's silly to have NR_OPEN bigger than NR_FILE, but you can change
@@ -117,7 +118,7 @@ struct inodes_stat_t {
* immediately wait on this read without caring about
* unplugging.
* READA Used for read-ahead operations. Lower priority, and the
- * block layer could (in theory) choose to ignore this
+ * block layer could (in theory) choose to ignore this
* request if it runs into resource problems.
* WRITE A normal async write. Device will be plugged.
* SWRITE Like WRITE, but a special case for ll_rw_block() that
@@ -144,12 +145,12 @@ struct inodes_stat_t {
* of this IO.
*
*/
-#define RW_MASK 1
-#define RWA_MASK 2
-#define READ 0
-#define WRITE 1
-#define READA 16 /* read-ahead - don't block if no resources */
-#define SWRITE 17 /* for ll_rw_block() - wait for buffer lock */
+#define RW_MASK (1 << BIO_RW)
+#define RWA_MASK (1 << BIO_RW_AHEAD)
+#define READ 0
+#define WRITE RW_MASK
+#define READA RWA_MASK
+#define SWRITE (WRITE | READA)
#define READ_SYNC (READ | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
#define READ_META (READ | (1 << BIO_RW_META))
#define WRITE_SYNC_PLUG (WRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))
@@ -2198,7 +2199,6 @@ static inline void insert_inode_hash(str
extern void file_move(struct file *f, struct list_head *list);
extern void file_kill(struct file *f);
#ifdef CONFIG_BLOCK
-struct bio;
extern void submit_bio(int, struct bio *);
extern int bdev_read_only(struct block_device *);
#endif
@@ -2265,7 +2265,6 @@ static inline int xip_truncate_page(stru
#endif
#ifdef CONFIG_BLOCK
-struct bio;
typedef void (dio_submit_t)(int rw, struct bio *bio, struct inode *inode,
loff_t file_offset);
void dio_end_io(struct bio *bio, int error);
|
|
From: Tejun H. <tj...@ke...> - 2010-08-02 14:18:01
|
linux/fs.h hard coded READ/WRITE constants which should match BIO_RW_*
flags. This is fragile and caused breakage during BIO_RW_* flag
rearrangement. The hardcoding is to avoid include dependency hell.
Create linux/bio_types.h which contatins definitions for bio data
structures and flags and include it from bio.h and fs.h, and make fs.h
define all READ/WRITE related constants in terms of BIO_RW_* flags.
Signed-off-by: Tejun Heo <tj...@ke...>
Cc: Jens Axobe <ax...@ke...>
---
include/linux/bio.h | 153 +-----------------------------------------
include/linux/bio_types.h | 164 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 17 ++--
3 files changed, 176 insertions(+), 158 deletions(-)
Index: work/include/linux/bio.h
===================================================================
--- work.orig/include/linux/bio.h
+++ work/include/linux/bio.h
@@ -9,7 +9,7 @@
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
-
+ *
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
@@ -28,6 +28,9 @@
#include <asm/io.h>
+/* struct bio, bio_vec and BIO_* flags are defined in bio_types.h */
+#include <linux/bio_types.h>
+
#define BIO_DEBUG
#ifdef BIO_DEBUG
@@ -41,154 +44,6 @@
#define BIO_MAX_SECTORS (BIO_MAX_SIZE >> 9)
/*
- * was unsigned short, but we might as well be ready for > 64kB I/O pages
- */
-struct bio_vec {
- struct page *bv_page;
- unsigned int bv_len;
- unsigned int bv_offset;
-};
-
-struct bio_set;
-struct bio;
-struct bio_integrity_payload;
-typedef void (bio_end_io_t) (struct bio *, int);
-typedef void (bio_destructor_t) (struct bio *);
-
-/*
- * main unit of I/O for the block layer and lower layers (ie drivers and
- * stacking drivers)
- */
-struct bio {
- sector_t bi_sector; /* device address in 512 byte
- sectors */
- struct bio *bi_next; /* request queue link */
- struct block_device *bi_bdev;
- unsigned long bi_flags; /* status, command, etc */
- unsigned long bi_rw; /* bottom bits READ/WRITE,
- * top bits priority
- */
-
- unsigned short bi_vcnt; /* how many bio_vec's */
- unsigned short bi_idx; /* current index into bvl_vec */
-
- /* Number of segments in this BIO after
- * physical address coalescing is performed.
- */
- unsigned int bi_phys_segments;
-
- unsigned int bi_size; /* residual I/O count */
-
- /*
- * To keep track of the max segment size, we account for the
- * sizes of the first and last mergeable segments in this bio.
- */
- unsigned int bi_seg_front_size;
- unsigned int bi_seg_back_size;
-
- unsigned int bi_max_vecs; /* max bvl_vecs we can hold */
-
- unsigned int bi_comp_cpu; /* completion CPU */
-
- atomic_t bi_cnt; /* pin count */
-
- struct bio_vec *bi_io_vec; /* the actual vec list */
-
- bio_end_io_t *bi_end_io;
-
- void *bi_private;
-#if defined(CONFIG_BLK_DEV_INTEGRITY)
- struct bio_integrity_payload *bi_integrity; /* data integrity */
-#endif
-
- bio_destructor_t *bi_destructor; /* destructor */
-
- /*
- * We can inline a number of vecs at the end of the bio, to avoid
- * double allocations for a small number of bio_vecs. This member
- * MUST obviously be kept at the very end of the bio.
- */
- struct bio_vec bi_inline_vecs[0];
-};
-
-/*
- * bio flags
- */
-#define BIO_UPTODATE 0 /* ok after I/O completion */
-#define BIO_RW_BLOCK 1 /* RW_AHEAD set, and read/write would block */
-#define BIO_EOF 2 /* out-out-bounds error */
-#define BIO_SEG_VALID 3 /* bi_phys_segments valid */
-#define BIO_CLONED 4 /* doesn't own data */
-#define BIO_BOUNCED 5 /* bio is a bounce bio */
-#define BIO_USER_MAPPED 6 /* contains user pages */
-#define BIO_EOPNOTSUPP 7 /* not supported */
-#define BIO_CPU_AFFINE 8 /* complete bio on same CPU as submitted */
-#define BIO_NULL_MAPPED 9 /* contains invalid user pages */
-#define BIO_FS_INTEGRITY 10 /* fs owns integrity data, not block layer */
-#define BIO_QUIET 11 /* Make BIO Quiet */
-#define bio_flagged(bio, flag) ((bio)->bi_flags & (1 << (flag)))
-
-/*
- * top 4 bits of bio flags indicate the pool this bio came from
- */
-#define BIO_POOL_BITS (4)
-#define BIO_POOL_NONE ((1UL << BIO_POOL_BITS) - 1)
-#define BIO_POOL_OFFSET (BITS_PER_LONG - BIO_POOL_BITS)
-#define BIO_POOL_MASK (1UL << BIO_POOL_OFFSET)
-#define BIO_POOL_IDX(bio) ((bio)->bi_flags >> BIO_POOL_OFFSET)
-
-/*
- * bio bi_rw flags
- *
- * bit 0 -- data direction
- * If not set, bio is a read from device. If set, it's a write to device.
- * bit 1 -- fail fast device errors
- * bit 2 -- fail fast transport errors
- * bit 3 -- fail fast driver errors
- * bit 4 -- rw-ahead when set
- * bit 5 -- barrier
- * Insert a serialization point in the IO queue, forcing previously
- * submitted IO to be completed before this one is issued.
- * bit 6 -- synchronous I/O hint.
- * bit 7 -- Unplug the device immediately after submitting this bio.
- * bit 8 -- metadata request
- * Used for tracing to differentiate metadata and data IO. May also
- * get some preferential treatment in the IO scheduler
- * bit 9 -- discard sectors
- * Informs the lower level device that this range of sectors is no longer
- * used by the file system and may thus be freed by the device. Used
- * for flash based storage.
- * Don't want driver retries for any fast fail whatever the reason.
- * bit 10 -- Tell the IO scheduler not to wait for more requests after this
- one has been submitted, even if it is a SYNC request.
- */
-enum bio_rw_flags {
- BIO_RW,
- BIO_RW_FAILFAST_DEV,
- BIO_RW_FAILFAST_TRANSPORT,
- BIO_RW_FAILFAST_DRIVER,
- /* above flags must match REQ_* */
- BIO_RW_AHEAD,
- BIO_RW_BARRIER,
- BIO_RW_SYNCIO,
- BIO_RW_UNPLUG,
- BIO_RW_META,
- BIO_RW_DISCARD,
- BIO_RW_NOIDLE,
-};
-
-/*
- * First four bits must match between bio->bi_rw and rq->cmd_flags, make
- * that explicit here.
- */
-#define BIO_RW_RQ_MASK 0xf
-
-static inline bool bio_rw_flagged(struct bio *bio, enum bio_rw_flags flag)
-{
- return (bio->bi_rw & (1 << flag)) != 0;
-}
-
-/*
* upper 16 bits of bi_rw define the io priority of this bio
*/
#define BIO_PRIO_SHIFT (8 * sizeof(unsigned long) - IOPRIO_BITS)
Index: work/include/linux/bio_types.h
===================================================================
--- /dev/null
+++ work/include/linux/bio_types.h
@@ -0,0 +1,164 @@
+/*
+ * BIO data types and constants. Include linux/bio.h for usual cases.
+ * Directly include this file only to break include dependency loop.
+ */
+#ifndef __LINUX_BIO_TYPES_H
+#define __LINUX_BIO_TYPES_H
+
+#ifdef CONFIG_BLOCK
+
+#include <linux/types.h>
+
+struct bio_set;
+struct bio;
+struct bio_integrity_payload;
+struct page;
+struct block_device;
+
+/*
+ * was unsigned short, but we might as well be ready for > 64kB I/O pages
+ */
+struct bio_vec {
+ struct page *bv_page;
+ unsigned int bv_len;
+ unsigned int bv_offset;
+};
+
+typedef void (bio_end_io_t) (struct bio *, int);
+typedef void (bio_destructor_t) (struct bio *);
+
+/*
+ * main unit of I/O for the block layer and lower layers (ie drivers and
+ * stacking drivers)
+ */
+struct bio {
+ sector_t bi_sector; /* device address in 512 byte
+ sectors */
+ struct bio *bi_next; /* request queue link */
+ struct block_device *bi_bdev;
+ unsigned long bi_flags; /* status, command, etc */
+ unsigned long bi_rw; /* bottom bits READ/WRITE,
+ * top bits priority
+ */
+
+ unsigned short bi_vcnt; /* how many bio_vec's */
+ unsigned short bi_idx; /* current index into bvl_vec */
+
+ /* Number of segments in this BIO after
+ * physical address coalescing is performed.
+ */
+ unsigned int bi_phys_segments;
+
+ unsigned int bi_size; /* residual I/O count */
+
+ /*
+ * To keep track of the max segment size, we account for the
+ * sizes of the first and last mergeable segments in this bio.
+ */
+ unsigned int bi_seg_front_size;
+ unsigned int bi_seg_back_size;
+
+ unsigned int bi_max_vecs; /* max bvl_vecs we can hold */
+
+ unsigned int bi_comp_cpu; /* completion CPU */
+
+ atomic_t bi_cnt; /* pin count */
+
+ struct bio_vec *bi_io_vec; /* the actual vec list */
+
+ bio_end_io_t *bi_end_io;
+
+ void *bi_private;
+#if defined(CONFIG_BLK_DEV_INTEGRITY)
+ struct bio_integrity_payload *bi_integrity; /* data integrity */
+#endif
+
+ bio_destructor_t *bi_destructor; /* destructor */
+
+ /*
+ * We can inline a number of vecs at the end of the bio, to avoid
+ * double allocations for a small number of bio_vecs. This member
+ * MUST obviously be kept at the very end of the bio.
+ */
+ struct bio_vec bi_inline_vecs[0];
+};
+
+/*
+ * bio flags
+ */
+#define BIO_UPTODATE 0 /* ok after I/O completion */
+#define BIO_RW_BLOCK 1 /* RW_AHEAD set, and read/write would block */
+#define BIO_EOF 2 /* out-out-bounds error */
+#define BIO_SEG_VALID 3 /* bi_phys_segments valid */
+#define BIO_CLONED 4 /* doesn't own data */
+#define BIO_BOUNCED 5 /* bio is a bounce bio */
+#define BIO_USER_MAPPED 6 /* contains user pages */
+#define BIO_EOPNOTSUPP 7 /* not supported */
+#define BIO_CPU_AFFINE 8 /* complete bio on same CPU as submitted */
+#define BIO_NULL_MAPPED 9 /* contains invalid user pages */
+#define BIO_FS_INTEGRITY 10 /* fs owns integrity data, not block layer */
+#define BIO_QUIET 11 /* Make BIO Quiet */
+#define bio_flagged(bio, flag) ((bio)->bi_flags & (1 << (flag)))
+
+/*
+ * top 4 bits of bio flags indicate the pool this bio came from
+ */
+#define BIO_POOL_BITS (4)
+#define BIO_POOL_NONE ((1UL << BIO_POOL_BITS) - 1)
+#define BIO_POOL_OFFSET (BITS_PER_LONG - BIO_POOL_BITS)
+#define BIO_POOL_MASK (1UL << BIO_POOL_OFFSET)
+#define BIO_POOL_IDX(bio) ((bio)->bi_flags >> BIO_POOL_OFFSET)
+
+/*
+ * bio bi_rw flags
+ *
+ * bit 0 -- data direction
+ * If not set, bio is a read from device. If set, it's a write to device.
+ * bit 1 -- fail fast device errors
+ * bit 2 -- fail fast transport errors
+ * bit 3 -- fail fast driver errors
+ * bit 4 -- rw-ahead when set
+ * bit 5 -- barrier
+ * Insert a serialization point in the IO queue, forcing previously
+ * submitted IO to be completed before this one is issued.
+ * bit 6 -- synchronous I/O hint.
+ * bit 7 -- Unplug the device immediately after submitting this bio.
+ * bit 8 -- metadata request
+ * Used for tracing to differentiate metadata and data IO. May also
+ * get some preferential treatment in the IO scheduler
+ * bit 9 -- discard sectors
+ * Informs the lower level device that this range of sectors is no longer
+ * used by the file system and may thus be freed by the device. Used
+ * for flash based storage.
+ * Don't want driver retries for any fast fail whatever the reason.
+ * bit 10 -- Tell the IO scheduler not to wait for more requests after this
+ one has been submitted, even if it is a SYNC request.
+ */
+enum bio_rw_flags {
+ BIO_RW,
+ BIO_RW_FAILFAST_DEV,
+ BIO_RW_FAILFAST_TRANSPORT,
+ BIO_RW_FAILFAST_DRIVER,
+ /* above flags must match REQ_* */
+ BIO_RW_AHEAD,
+ BIO_RW_BARRIER,
+ BIO_RW_SYNCIO,
+ BIO_RW_UNPLUG,
+ BIO_RW_META,
+ BIO_RW_DISCARD,
+ BIO_RW_NOIDLE,
+};
+
+/*
+ * First four bits must match between bio->bi_rw and rq->cmd_flags, make
+ * that explicit here.
+ */
+#define BIO_RW_RQ_MASK 0xf
+
+static inline bool bio_rw_flagged(struct bio *bio, enum bio_rw_flags flag)
+{
+ return (bio->bi_rw & (1 << flag)) != 0;
+}
+
+#endif /* CONFIG_BLOCK */
+#endif /* __LINUX_BIO_TYPES_H */
Index: work/include/linux/fs.h
===================================================================
--- work.orig/include/linux/fs.h
+++ work/include/linux/fs.h
@@ -8,6 +8,7 @@
#include <linux/limits.h>
#include <linux/ioctl.h>
+#include <linux/bio_types.h>
/*
* It's silly to have NR_OPEN bigger than NR_FILE, but you can change
@@ -117,7 +118,7 @@ struct inodes_stat_t {
* immediately wait on this read without caring about
* unplugging.
* READA Used for read-ahead operations. Lower priority, and the
- * block layer could (in theory) choose to ignore this
+ * block layer could (in theory) choose to ignore this
* request if it runs into resource problems.
* WRITE A normal async write. Device will be plugged.
* SWRITE Like WRITE, but a special case for ll_rw_block() that
@@ -144,12 +145,12 @@ struct inodes_stat_t {
* of this IO.
*
*/
-#define RW_MASK 1
-#define RWA_MASK 2
-#define READ 0
-#define WRITE 1
-#define READA 16 /* read-ahead - don't block if no resources */
-#define SWRITE 17 /* for ll_rw_block() - wait for buffer lock */
+#define RW_MASK (1 << BIO_RW)
+#define RWA_MASK (1 << BIO_RW_AHEAD)
+#define READ 0
+#define WRITE RW_MASK
+#define READA RWA_MASK
+#define SWRITE (WRITE | READA)
#define READ_SYNC (READ | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
#define READ_META (READ | (1 << BIO_RW_META))
#define WRITE_SYNC_PLUG (WRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))
@@ -2198,7 +2199,6 @@ static inline void insert_inode_hash(str
extern void file_move(struct file *f, struct list_head *list);
extern void file_kill(struct file *f);
#ifdef CONFIG_BLOCK
-struct bio;
extern void submit_bio(int, struct bio *);
extern int bdev_read_only(struct block_device *);
#endif
@@ -2265,7 +2265,6 @@ static inline int xip_truncate_page(stru
#endif
#ifdef CONFIG_BLOCK
-struct bio;
typedef void (dio_submit_t)(int rw, struct bio *bio, struct inode *inode,
loff_t file_offset);
void dio_end_io(struct bio *bio, int error);
|
|
From: Tejun H. <tj...@ke...> - 2010-08-02 14:17:50
|
Commit a82afdf (block: use the same failfast bits for bio and request)
moved BIO_RW_* bits around such that they match up with REQ_* bits.
Unfortunately, fs.h hard coded READ, WRITE, READA and SWRITE as 0, 1,
2 and 3, and expected them to match with BIO_RW_* bits. READ/WRITE
didn't change but BIO_RW_AHEAD was moved to bit 4 instead of bit 1,
breaking READA and SWRITE.
This patch updates READA and SWRITE such that they match the BIO_RW_*
bits again. A follow up patch will update the definitions to directly
use BIO_RW_* bits so that this kind of breakage won't happen again.
Stable: The offending commit a82afdf was released with v2.6.32, so
this patch should be applied to all kernels since then but it must
_NOT_ be applied to kernels earlier than that.
Signed-off-by: Tejun Heo <tj...@ke...>
Reported-and-bisected-by: Vladislav Bolkhovitin <vs...@vl...>
Root-caused-by: Neil Brown <ne...@su...>
Cc: Jens Axobe <ax...@ke...>
Cc: st...@ke...
---
Aieee... thanks for root causing it Neil. That was a stupid bug. I
knew that READ/WRITE were hardcoded but forgot about READA. :-(
Moving BIO_RW_AHEAD back to bit 1 might be a better solution but I'm
afraid that would cause more confusions downstream. This patch
updates READA and SWRITE to match BIO_RW_AHEAD and should also appear
in -stable releases. The next patch will create bio_types.h and
define all constants in terms of BIO_RW_*.
Thanks.
(resending w/ Jens' new address)
include/linux/fs.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: work/include/linux/fs.h
===================================================================
--- work.orig/include/linux/fs.h
+++ work/include/linux/fs.h
@@ -148,8 +148,8 @@ struct inodes_stat_t {
#define RWA_MASK 2
#define READ 0
#define WRITE 1
-#define READA 2 /* read-ahead - don't block if no resources */
-#define SWRITE 3 /* for ll_rw_block() - wait for buffer lock */
+#define READA 16 /* read-ahead - don't block if no resources */
+#define SWRITE 17 /* for ll_rw_block() - wait for buffer lock */
#define READ_SYNC (READ | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
#define READ_META (READ | (1 << BIO_RW_META))
#define WRITE_SYNC_PLUG (WRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))
|
|
From: Jens A. <ax...@ke...> - 2010-08-02 20:21:44
|
On 08/02/2010 04:17 PM, Tejun Heo wrote: > Aieee... thanks for root causing it Neil. That was a stupid bug. I > knew that READ/WRITE were hardcoded but forgot about READA. :-( > Moving BIO_RW_AHEAD back to bit 1 might be a better solution but I'm > afraid that would cause more confusions downstream. This patch > updates READA and SWRITE to match BIO_RW_AHEAD and should also appear > in -stable releases. The next patch will create bio_types.h and > define all constants in terms of BIO_RW_*. Tejun, care to resend these against for-2.6.36? We can reference these for the stable backport (at least the first one should go in). -- Jens Axboe |
|
From: Tejun H. <tj...@ke...> - 2010-08-03 09:54:37
|
linux/fs.h hard coded READ/WRITE constants which should match BIO_RW_*
flags. This is fragile and caused breakage during BIO_RW_* flag
rearrangement. The hardcoding is to avoid include dependency hell.
Create linux/bio_types.h which contatins definitions for bio data
structures and flags and include it from bio.h and fs.h, and make fs.h
define all READ/WRITE related constants in terms of BIO_RW_* flags.
Signed-off-by: Tejun Heo <tj...@ke...>
Cc: Jens Axobe <ax...@ke...>
---
I renamed the file to blk_types.h instead of bio_types.h as it now
contains the REQ bits too.
include/linux/bio.h | 183 -------------------------------------------
include/linux/blk_types.h | 193 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 15 +--
3 files changed, 204 insertions(+), 187 deletions(-)
Index: work/include/linux/bio.h
===================================================================
--- work.orig/include/linux/bio.h
+++ work/include/linux/bio.h
@@ -9,7 +9,7 @@
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
-
+ *
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
@@ -28,6 +28,9 @@
#include <asm/io.h>
+/* struct bio, bio_vec and BIO_* flags are defined in blk_types.h */
+#include <linux/blk_types.h>
+
#define BIO_DEBUG
#ifdef BIO_DEBUG
@@ -41,184 +44,6 @@
#define BIO_MAX_SECTORS (BIO_MAX_SIZE >> 9)
/*
- * was unsigned short, but we might as well be ready for > 64kB I/O pages
- */
-struct bio_vec {
- struct page *bv_page;
- unsigned int bv_len;
- unsigned int bv_offset;
-};
-
-struct bio_set;
-struct bio;
-struct bio_integrity_payload;
-typedef void (bio_end_io_t) (struct bio *, int);
-typedef void (bio_destructor_t) (struct bio *);
-
-/*
- * main unit of I/O for the block layer and lower layers (ie drivers and
- * stacking drivers)
- */
-struct bio {
- sector_t bi_sector; /* device address in 512 byte
- sectors */
- struct bio *bi_next; /* request queue link */
- struct block_device *bi_bdev;
- unsigned long bi_flags; /* status, command, etc */
- unsigned long bi_rw; /* bottom bits READ/WRITE,
- * top bits priority
- */
-
- unsigned short bi_vcnt; /* how many bio_vec's */
- unsigned short bi_idx; /* current index into bvl_vec */
-
- /* Number of segments in this BIO after
- * physical address coalescing is performed.
- */
- unsigned int bi_phys_segments;
-
- unsigned int bi_size; /* residual I/O count */
-
- /*
- * To keep track of the max segment size, we account for the
- * sizes of the first and last mergeable segments in this bio.
- */
- unsigned int bi_seg_front_size;
- unsigned int bi_seg_back_size;
-
- unsigned int bi_max_vecs; /* max bvl_vecs we can hold */
-
- unsigned int bi_comp_cpu; /* completion CPU */
-
- atomic_t bi_cnt; /* pin count */
-
- struct bio_vec *bi_io_vec; /* the actual vec list */
-
- bio_end_io_t *bi_end_io;
-
- void *bi_private;
-#if defined(CONFIG_BLK_DEV_INTEGRITY)
- struct bio_integrity_payload *bi_integrity; /* data integrity */
-#endif
-
- bio_destructor_t *bi_destructor; /* destructor */
-
- /*
- * We can inline a number of vecs at the end of the bio, to avoid
- * double allocations for a small number of bio_vecs. This member
- * MUST obviously be kept at the very end of the bio.
- */
- struct bio_vec bi_inline_vecs[0];
-};
-
-/*
- * bio flags
- */
-#define BIO_UPTODATE 0 /* ok after I/O completion */
-#define BIO_RW_BLOCK 1 /* RW_AHEAD set, and read/write would block */
-#define BIO_EOF 2 /* out-out-bounds error */
-#define BIO_SEG_VALID 3 /* bi_phys_segments valid */
-#define BIO_CLONED 4 /* doesn't own data */
-#define BIO_BOUNCED 5 /* bio is a bounce bio */
-#define BIO_USER_MAPPED 6 /* contains user pages */
-#define BIO_EOPNOTSUPP 7 /* not supported */
-#define BIO_CPU_AFFINE 8 /* complete bio on same CPU as submitted */
-#define BIO_NULL_MAPPED 9 /* contains invalid user pages */
-#define BIO_FS_INTEGRITY 10 /* fs owns integrity data, not block layer */
-#define BIO_QUIET 11 /* Make BIO Quiet */
-#define bio_flagged(bio, flag) ((bio)->bi_flags & (1 << (flag)))
-
-/*
- * top 4 bits of bio flags indicate the pool this bio came from
- */
-#define BIO_POOL_BITS (4)
-#define BIO_POOL_NONE ((1UL << BIO_POOL_BITS) - 1)
-#define BIO_POOL_OFFSET (BITS_PER_LONG - BIO_POOL_BITS)
-#define BIO_POOL_MASK (1UL << BIO_POOL_OFFSET)
-#define BIO_POOL_IDX(bio) ((bio)->bi_flags >> BIO_POOL_OFFSET)
-
-/*
- * Request flags. For use in the cmd_flags field of struct request, and in
- * bi_rw of struct bio. Note that some flags are only valid in either one.
- */
-enum rq_flag_bits {
- /* common flags */
- __REQ_WRITE, /* not set, read. set, write */
- __REQ_FAILFAST_DEV, /* no driver retries of device errors */
- __REQ_FAILFAST_TRANSPORT, /* no driver retries of transport errors */
- __REQ_FAILFAST_DRIVER, /* no driver retries of driver errors */
-
- __REQ_HARDBARRIER, /* may not be passed by drive either */
- __REQ_SYNC, /* request is sync (sync write or read) */
- __REQ_META, /* metadata io request */
- __REQ_DISCARD, /* request to discard sectors */
- __REQ_NOIDLE, /* don't anticipate more IO after this one */
-
- /* bio only flags */
- __REQ_UNPLUG, /* unplug the immediately after submission */
- __REQ_RAHEAD, /* read ahead, can fail anytime */
-
- /* request only flags */
- __REQ_SORTED, /* elevator knows about this request */
- __REQ_SOFTBARRIER, /* may not be passed by ioscheduler */
- __REQ_FUA, /* forced unit access */
- __REQ_NOMERGE, /* don't touch this for merging */
- __REQ_STARTED, /* drive already may have started this one */
- __REQ_DONTPREP, /* don't call prep for this one */
- __REQ_QUEUED, /* uses queueing */
- __REQ_ELVPRIV, /* elevator private data attached */
- __REQ_FAILED, /* set if the request failed */
- __REQ_QUIET, /* don't worry about errors */
- __REQ_PREEMPT, /* set for "ide_preempt" requests */
- __REQ_ORDERED_COLOR, /* is before or after barrier */
- __REQ_ALLOCED, /* request came from our alloc pool */
- __REQ_COPY_USER, /* contains copies of user pages */
- __REQ_INTEGRITY, /* integrity metadata has been remapped */
- __REQ_FLUSH, /* request for cache flush */
- __REQ_IO_STAT, /* account I/O stat */
- __REQ_MIXED_MERGE, /* merge of different types, fail separately */
- __REQ_NR_BITS, /* stops here */
-};
-
-#define REQ_WRITE (1 << __REQ_WRITE)
-#define REQ_FAILFAST_DEV (1 << __REQ_FAILFAST_DEV)
-#define REQ_FAILFAST_TRANSPORT (1 << __REQ_FAILFAST_TRANSPORT)
-#define REQ_FAILFAST_DRIVER (1 << __REQ_FAILFAST_DRIVER)
-#define REQ_HARDBARRIER (1 << __REQ_HARDBARRIER)
-#define REQ_SYNC (1 << __REQ_SYNC)
-#define REQ_META (1 << __REQ_META)
-#define REQ_DISCARD (1 << __REQ_DISCARD)
-#define REQ_NOIDLE (1 << __REQ_NOIDLE)
-
-#define REQ_FAILFAST_MASK \
- (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
-#define REQ_COMMON_MASK \
- (REQ_WRITE | REQ_FAILFAST_MASK | REQ_HARDBARRIER | REQ_SYNC | \
- REQ_META| REQ_DISCARD | REQ_NOIDLE)
-
-#define REQ_UNPLUG (1 << __REQ_UNPLUG)
-#define REQ_RAHEAD (1 << __REQ_RAHEAD)
-
-#define REQ_SORTED (1 << __REQ_SORTED)
-#define REQ_SOFTBARRIER (1 << __REQ_SOFTBARRIER)
-#define REQ_FUA (1 << __REQ_FUA)
-#define REQ_NOMERGE (1 << __REQ_NOMERGE)
-#define REQ_STARTED (1 << __REQ_STARTED)
-#define REQ_DONTPREP (1 << __REQ_DONTPREP)
-#define REQ_QUEUED (1 << __REQ_QUEUED)
-#define REQ_ELVPRIV (1 << __REQ_ELVPRIV)
-#define REQ_FAILED (1 << __REQ_FAILED)
-#define REQ_QUIET (1 << __REQ_QUIET)
-#define REQ_PREEMPT (1 << __REQ_PREEMPT)
-#define REQ_ORDERED_COLOR (1 << __REQ_ORDERED_COLOR)
-#define REQ_ALLOCED (1 << __REQ_ALLOCED)
-#define REQ_COPY_USER (1 << __REQ_COPY_USER)
-#define REQ_INTEGRITY (1 << __REQ_INTEGRITY)
-#define REQ_FLUSH (1 << __REQ_FLUSH)
-#define REQ_IO_STAT (1 << __REQ_IO_STAT)
-#define REQ_MIXED_MERGE (1 << __REQ_MIXED_MERGE)
-
-/*
* upper 16 bits of bi_rw define the io priority of this bio
*/
#define BIO_PRIO_SHIFT (8 * sizeof(unsigned long) - IOPRIO_BITS)
Index: work/include/linux/fs.h
===================================================================
--- work.orig/include/linux/fs.h
+++ work/include/linux/fs.h
@@ -8,6 +8,7 @@
#include <linux/limits.h>
#include <linux/ioctl.h>
+#include <linux/blk_types.h>
/*
* It's silly to have NR_OPEN bigger than NR_FILE, but you can change
@@ -117,7 +118,7 @@ struct inodes_stat_t {
* immediately wait on this read without caring about
* unplugging.
* READA Used for read-ahead operations. Lower priority, and the
- * block layer could (in theory) choose to ignore this
+ * block layer could (in theory) choose to ignore this
* request if it runs into resource problems.
* WRITE A normal async write. Device will be plugged.
* SWRITE Like WRITE, but a special case for ll_rw_block() that
@@ -144,13 +145,13 @@ struct inodes_stat_t {
* of this IO.
*
*/
-#define RW_MASK 1
-#define RWA_MASK 16
+#define RW_MASK REQ_WRITE
+#define RWA_MASK REQ_RAHEAD
#define READ 0
-#define WRITE 1
-#define READA 16 /* readahead - don't block if no resources */
-#define SWRITE 17 /* for ll_rw_block(), wait for buffer lock */
+#define WRITE RW_MASK
+#define READA RWA_MASK
+#define SWRITE (WRITE | READA)
#define READ_SYNC (READ | REQ_SYNC | REQ_UNPLUG)
#define READ_META (READ | REQ_META)
@@ -2200,7 +2201,6 @@ static inline void insert_inode_hash(str
extern void file_move(struct file *f, struct list_head *list);
extern void file_kill(struct file *f);
#ifdef CONFIG_BLOCK
-struct bio;
extern void submit_bio(int, struct bio *);
extern int bdev_read_only(struct block_device *);
#endif
@@ -2267,7 +2267,6 @@ static inline int xip_truncate_page(stru
#endif
#ifdef CONFIG_BLOCK
-struct bio;
typedef void (dio_submit_t)(int rw, struct bio *bio, struct inode *inode,
loff_t file_offset);
void dio_end_io(struct bio *bio, int error);
Index: work/include/linux/blk_types.h
===================================================================
--- /dev/null
+++ work/include/linux/blk_types.h
@@ -0,0 +1,193 @@
+/*
+ * Block data types and constants. Directly include this file only to
+ * break include dependency loop.
+ */
+#ifndef __LINUX_BLK_TYPES_H
+#define __LINUX_BLK_TYPES_H
+
+#ifdef CONFIG_BLOCK
+
+#include <linux/types.h>
+
+struct bio_set;
+struct bio;
+struct bio_integrity_payload;
+struct page;
+struct block_device;
+typedef void (bio_end_io_t) (struct bio *, int);
+typedef void (bio_destructor_t) (struct bio *);
+
+/*
+ * was unsigned short, but we might as well be ready for > 64kB I/O pages
+ */
+struct bio_vec {
+ struct page *bv_page;
+ unsigned int bv_len;
+ unsigned int bv_offset;
+};
+
+/*
+ * main unit of I/O for the block layer and lower layers (ie drivers and
+ * stacking drivers)
+ */
+struct bio {
+ sector_t bi_sector; /* device address in 512 byte
+ sectors */
+ struct bio *bi_next; /* request queue link */
+ struct block_device *bi_bdev;
+ unsigned long bi_flags; /* status, command, etc */
+ unsigned long bi_rw; /* bottom bits READ/WRITE,
+ * top bits priority
+ */
+
+ unsigned short bi_vcnt; /* how many bio_vec's */
+ unsigned short bi_idx; /* current index into bvl_vec */
+
+ /* Number of segments in this BIO after
+ * physical address coalescing is performed.
+ */
+ unsigned int bi_phys_segments;
+
+ unsigned int bi_size; /* residual I/O count */
+
+ /*
+ * To keep track of the max segment size, we account for the
+ * sizes of the first and last mergeable segments in this bio.
+ */
+ unsigned int bi_seg_front_size;
+ unsigned int bi_seg_back_size;
+
+ unsigned int bi_max_vecs; /* max bvl_vecs we can hold */
+
+ unsigned int bi_comp_cpu; /* completion CPU */
+
+ atomic_t bi_cnt; /* pin count */
+
+ struct bio_vec *bi_io_vec; /* the actual vec list */
+
+ bio_end_io_t *bi_end_io;
+
+ void *bi_private;
+#if defined(CONFIG_BLK_DEV_INTEGRITY)
+ struct bio_integrity_payload *bi_integrity; /* data integrity */
+#endif
+
+ bio_destructor_t *bi_destructor; /* destructor */
+
+ /*
+ * We can inline a number of vecs at the end of the bio, to avoid
+ * double allocations for a small number of bio_vecs. This member
+ * MUST obviously be kept at the very end of the bio.
+ */
+ struct bio_vec bi_inline_vecs[0];
+};
+
+/*
+ * bio flags
+ */
+#define BIO_UPTODATE 0 /* ok after I/O completion */
+#define BIO_RW_BLOCK 1 /* RW_AHEAD set, and read/write would block */
+#define BIO_EOF 2 /* out-out-bounds error */
+#define BIO_SEG_VALID 3 /* bi_phys_segments valid */
+#define BIO_CLONED 4 /* doesn't own data */
+#define BIO_BOUNCED 5 /* bio is a bounce bio */
+#define BIO_USER_MAPPED 6 /* contains user pages */
+#define BIO_EOPNOTSUPP 7 /* not supported */
+#define BIO_CPU_AFFINE 8 /* complete bio on same CPU as submitted */
+#define BIO_NULL_MAPPED 9 /* contains invalid user pages */
+#define BIO_FS_INTEGRITY 10 /* fs owns integrity data, not block layer */
+#define BIO_QUIET 11 /* Make BIO Quiet */
+#define bio_flagged(bio, flag) ((bio)->bi_flags & (1 << (flag)))
+
+/*
+ * top 4 bits of bio flags indicate the pool this bio came from
+ */
+#define BIO_POOL_BITS (4)
+#define BIO_POOL_NONE ((1UL << BIO_POOL_BITS) - 1)
+#define BIO_POOL_OFFSET (BITS_PER_LONG - BIO_POOL_BITS)
+#define BIO_POOL_MASK (1UL << BIO_POOL_OFFSET)
+#define BIO_POOL_IDX(bio) ((bio)->bi_flags >> BIO_POOL_OFFSET)
+
+/*
+ * Request flags. For use in the cmd_flags field of struct request, and in
+ * bi_rw of struct bio. Note that some flags are only valid in either one.
+ */
+enum rq_flag_bits {
+ /* common flags */
+ __REQ_WRITE, /* not set, read. set, write */
+ __REQ_FAILFAST_DEV, /* no driver retries of device errors */
+ __REQ_FAILFAST_TRANSPORT, /* no driver retries of transport errors */
+ __REQ_FAILFAST_DRIVER, /* no driver retries of driver errors */
+
+ __REQ_HARDBARRIER, /* may not be passed by drive either */
+ __REQ_SYNC, /* request is sync (sync write or read) */
+ __REQ_META, /* metadata io request */
+ __REQ_DISCARD, /* request to discard sectors */
+ __REQ_NOIDLE, /* don't anticipate more IO after this one */
+
+ /* bio only flags */
+ __REQ_UNPLUG, /* unplug the immediately after submission */
+ __REQ_RAHEAD, /* read ahead, can fail anytime */
+
+ /* request only flags */
+ __REQ_SORTED, /* elevator knows about this request */
+ __REQ_SOFTBARRIER, /* may not be passed by ioscheduler */
+ __REQ_FUA, /* forced unit access */
+ __REQ_NOMERGE, /* don't touch this for merging */
+ __REQ_STARTED, /* drive already may have started this one */
+ __REQ_DONTPREP, /* don't call prep for this one */
+ __REQ_QUEUED, /* uses queueing */
+ __REQ_ELVPRIV, /* elevator private data attached */
+ __REQ_FAILED, /* set if the request failed */
+ __REQ_QUIET, /* don't worry about errors */
+ __REQ_PREEMPT, /* set for "ide_preempt" requests */
+ __REQ_ORDERED_COLOR, /* is before or after barrier */
+ __REQ_ALLOCED, /* request came from our alloc pool */
+ __REQ_COPY_USER, /* contains copies of user pages */
+ __REQ_INTEGRITY, /* integrity metadata has been remapped */
+ __REQ_FLUSH, /* request for cache flush */
+ __REQ_IO_STAT, /* account I/O stat */
+ __REQ_MIXED_MERGE, /* merge of different types, fail separately */
+ __REQ_NR_BITS, /* stops here */
+};
+
+#define REQ_WRITE (1 << __REQ_WRITE)
+#define REQ_FAILFAST_DEV (1 << __REQ_FAILFAST_DEV)
+#define REQ_FAILFAST_TRANSPORT (1 << __REQ_FAILFAST_TRANSPORT)
+#define REQ_FAILFAST_DRIVER (1 << __REQ_FAILFAST_DRIVER)
+#define REQ_HARDBARRIER (1 << __REQ_HARDBARRIER)
+#define REQ_SYNC (1 << __REQ_SYNC)
+#define REQ_META (1 << __REQ_META)
+#define REQ_DISCARD (1 << __REQ_DISCARD)
+#define REQ_NOIDLE (1 << __REQ_NOIDLE)
+
+#define REQ_FAILFAST_MASK \
+ (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
+#define REQ_COMMON_MASK \
+ (REQ_WRITE | REQ_FAILFAST_MASK | REQ_HARDBARRIER | REQ_SYNC | \
+ REQ_META| REQ_DISCARD | REQ_NOIDLE)
+
+#define REQ_UNPLUG (1 << __REQ_UNPLUG)
+#define REQ_RAHEAD (1 << __REQ_RAHEAD)
+
+#define REQ_SORTED (1 << __REQ_SORTED)
+#define REQ_SOFTBARRIER (1 << __REQ_SOFTBARRIER)
+#define REQ_FUA (1 << __REQ_FUA)
+#define REQ_NOMERGE (1 << __REQ_NOMERGE)
+#define REQ_STARTED (1 << __REQ_STARTED)
+#define REQ_DONTPREP (1 << __REQ_DONTPREP)
+#define REQ_QUEUED (1 << __REQ_QUEUED)
+#define REQ_ELVPRIV (1 << __REQ_ELVPRIV)
+#define REQ_FAILED (1 << __REQ_FAILED)
+#define REQ_QUIET (1 << __REQ_QUIET)
+#define REQ_PREEMPT (1 << __REQ_PREEMPT)
+#define REQ_ORDERED_COLOR (1 << __REQ_ORDERED_COLOR)
+#define REQ_ALLOCED (1 << __REQ_ALLOCED)
+#define REQ_COPY_USER (1 << __REQ_COPY_USER)
+#define REQ_INTEGRITY (1 << __REQ_INTEGRITY)
+#define REQ_FLUSH (1 << __REQ_FLUSH)
+#define REQ_IO_STAT (1 << __REQ_IO_STAT)
+#define REQ_MIXED_MERGE (1 << __REQ_MIXED_MERGE)
+
+#endif /* CONFIG_BLOCK */
+#endif /* __LINUX_BLK_TYPES_H */
|
|
From: Tejun H. <tj...@ke...> - 2010-08-03 09:53:28
|
Commit a82afdf (block: use the same failfast bits for bio and request)
moved BIO_RW_* bits around such that they match up with REQ_* bits.
Unfortunately, fs.h hard coded RW_MASK, RWA_MASK, READ, WRITE, READA
and SWRITE as 0, 1, 2 and 3, and expected them to match with BIO_RW_*
bits. READ/WRITE didn't change but BIO_RW_AHEAD was moved to bit 4
instead of bit 1, breaking RWA_MASK, READA and SWRITE.
This patch updates RWA_MASK, READA and SWRITE such that they match the
BIO_RW_* bits again. A follow up patch will update the definitions to
directly use BIO_RW_* bits so that this kind of breakage won't happen
again.
Neil also spotted missing RWA_MASK conversion.
Stable: The offending commit a82afdf was released with v2.6.32, so
this patch should be applied to all kernels since then but it must
_NOT_ be applied to kernels earlier than that.
Signed-off-by: Tejun Heo <tj...@ke...>
Reported-and-bisected-by: Vladislav Bolkhovitin <vs...@vl...>
Root-caused-by: Neil Brown <ne...@su...>
Cc: Jens Axobe <ax...@ke...>
Cc: st...@ke...
---
Here's the regenerated version also w/ the missing RWA_MASK conversion
Neil spotted.
Thanks.
include/linux/fs.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
Index: work/include/linux/fs.h
===================================================================
--- work.orig/include/linux/fs.h
+++ work/include/linux/fs.h
@@ -145,12 +145,12 @@ struct inodes_stat_t {
*
*/
#define RW_MASK 1
-#define RWA_MASK 2
+#define RWA_MASK 16
#define READ 0
#define WRITE 1
-#define READA 2 /* readahead - don't block if no resources */
-#define SWRITE 3 /* for ll_rw_block() - wait for buffer lock */
+#define READA 16 /* readahead - don't block if no resources */
+#define SWRITE 17 /* for ll_rw_block(), wait for buffer lock */
#define READ_SYNC (READ | REQ_SYNC | REQ_UNPLUG)
#define READ_META (READ | REQ_META)
|
|
From: Jens A. <ax...@ke...> - 2010-08-03 11:15:40
|
On 2010-08-03 11:53, Tejun Heo wrote:
> Commit a82afdf (block: use the same failfast bits for bio and request)
> moved BIO_RW_* bits around such that they match up with REQ_* bits.
> Unfortunately, fs.h hard coded RW_MASK, RWA_MASK, READ, WRITE, READA
> and SWRITE as 0, 1, 2 and 3, and expected them to match with BIO_RW_*
> bits. READ/WRITE didn't change but BIO_RW_AHEAD was moved to bit 4
> instead of bit 1, breaking RWA_MASK, READA and SWRITE.
>
> This patch updates RWA_MASK, READA and SWRITE such that they match the
> BIO_RW_* bits again. A follow up patch will update the definitions to
> directly use BIO_RW_* bits so that this kind of breakage won't happen
> again.
>
> Neil also spotted missing RWA_MASK conversion.
>
> Stable: The offending commit a82afdf was released with v2.6.32, so
> this patch should be applied to all kernels since then but it must
> _NOT_ be applied to kernels earlier than that.
>
> Signed-off-by: Tejun Heo <tj...@ke...>
> Reported-and-bisected-by: Vladislav Bolkhovitin <vs...@vl...>
> Root-caused-by: Neil Brown <ne...@su...>
> Cc: Jens Axobe <ax...@ke...>
^^^^^
(Too) common typo :-)
Anyway, applied to for-2.6.36, thanks a lot.
--
Jens Axboe
|
|
From: Jens A. <ja...@fu...> - 2010-08-03 11:36:49
|
On 2010-08-03 13:15, Jens Axboe wrote:
> On 2010-08-03 11:53, Tejun Heo wrote:
>> Commit a82afdf (block: use the same failfast bits for bio and request)
>> moved BIO_RW_* bits around such that they match up with REQ_* bits.
>> Unfortunately, fs.h hard coded RW_MASK, RWA_MASK, READ, WRITE, READA
>> and SWRITE as 0, 1, 2 and 3, and expected them to match with BIO_RW_*
>> bits. READ/WRITE didn't change but BIO_RW_AHEAD was moved to bit 4
>> instead of bit 1, breaking RWA_MASK, READA and SWRITE.
>>
>> This patch updates RWA_MASK, READA and SWRITE such that they match the
>> BIO_RW_* bits again. A follow up patch will update the definitions to
>> directly use BIO_RW_* bits so that this kind of breakage won't happen
>> again.
>>
>> Neil also spotted missing RWA_MASK conversion.
>>
>> Stable: The offending commit a82afdf was released with v2.6.32, so
>> this patch should be applied to all kernels since then but it must
>> _NOT_ be applied to kernels earlier than that.
>>
>> Signed-off-by: Tejun Heo <tj...@ke...>
>> Reported-and-bisected-by: Vladislav Bolkhovitin <vs...@vl...>
>> Root-caused-by: Neil Brown <ne...@su...>
>> Cc: Jens Axobe <ax...@ke...>
> ^^^^^
>
> (Too) common typo :-)
>
> Anyway, applied to for-2.6.36, thanks a lot.
Irk, we have an issue:
In file included from fs/coda/psdev.c:48:
include/linux/coda_psdev.h:91:1: warning: "REQ_WRITE" redefined
In file included from include/linux/fs.h:11,
from include/linux/proc_fs.h:5,
from fs/coda/psdev.c:31:
include/linux/blk_types.h:154:1: warning: this is the location of the previous definition
And from include/linux/coda_psdev.h:
#define REQ_ASYNC 0x1
#define REQ_READ 0x2
#define REQ_WRITE 0x4
#define REQ_ABORT 0x8
which unfortunately seem to not be under __KERNEL__ protection, but
there are things like wait_queue_head_t structs there as well so should
be safe to change.
--
Jens Axboe
Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.
|
|
From: Tejun H. <tj...@ke...> - 2010-08-03 15:52:22
|
(cc'ing CODA people) Hello, On 08/03/2010 01:21 PM, Jens Axboe wrote: >>> Signed-off-by: Tejun Heo <tj...@ke...> >>> Reported-and-bisected-by: Vladislav Bolkhovitin <vs...@vl...> >>> Root-caused-by: Neil Brown <ne...@su...> >>> Cc: Jens Axobe <ax...@ke...> >> ^^^^^ >> >> (Too) common typo :-) >> >> Anyway, applied to for-2.6.36, thanks a lot. Oops, I'm sorry. Heh, it reminds me of misspelling Linus's last name as Tolvards in a number of patches. :-) > Irk, we have an issue: > > In file included from fs/coda/psdev.c:48: > include/linux/coda_psdev.h:91:1: warning: "REQ_WRITE" redefined > In file included from include/linux/fs.h:11, > from include/linux/proc_fs.h:5, > from fs/coda/psdev.c:31: > include/linux/blk_types.h:154:1: warning: this is the location of the previous definition > > And from include/linux/coda_psdev.h: > > #define REQ_ASYNC 0x1 > #define REQ_READ 0x2 > #define REQ_WRITE 0x4 > #define REQ_ABORT 0x8 > > which unfortunately seem to not be under __KERNEL__ protection, but > there are things like wait_queue_head_t structs there as well so should > be safe to change. Yeah, I hate it when symbols in non-core code doesn't have proper prefix. Prefixing CODA_ in front of those macros should do it. Jan Harkes, would that work for CODA? Thanks. -- tejun |
|
From: Tejun H. <tj...@ke...> - 2010-08-03 16:02:47
|
REQ_* constants are used for block layer requests causing inconsistent
duplicate definitions of REQ_WRITE. Rename REQ_* used by coda to
CODA_REQ_*.
Signed-off-by: Tejun Heo <tj...@ke...>
---
So, something like this. Build tested only.
Thanks.
fs/coda/psdev.c | 14 +++++++-------
fs/coda/upcall.c | 12 ++++++------
include/linux/coda_psdev.h | 8 ++++----
3 files changed, 17 insertions(+), 17 deletions(-)
Index: work/fs/coda/psdev.c
===================================================================
--- work.orig/fs/coda/psdev.c
+++ work/fs/coda/psdev.c
@@ -177,7 +177,7 @@ static ssize_t coda_psdev_write(struct f
nbytes = req->uc_outSize; /* don't have more space! */
}
if (copy_from_user(req->uc_data, buf, nbytes)) {
- req->uc_flags |= REQ_ABORT;
+ req->uc_flags |= CODA_REQ_ABORT;
wake_up(&req->uc_sleep);
retval = -EFAULT;
goto out;
@@ -185,7 +185,7 @@ static ssize_t coda_psdev_write(struct f
/* adjust outsize. is this useful ?? */
req->uc_outSize = nbytes;
- req->uc_flags |= REQ_WRITE;
+ req->uc_flags |= CODA_REQ_WRITE;
count = nbytes;
/* Convert filedescriptor into a file handle */
@@ -254,8 +254,8 @@ static ssize_t coda_psdev_read(struct fi
retval = -EFAULT;
/* If request was not a signal, enqueue and don't free */
- if (!(req->uc_flags & REQ_ASYNC)) {
- req->uc_flags |= REQ_READ;
+ if (!(req->uc_flags & CODA_REQ_ASYNC)) {
+ req->uc_flags |= CODA_REQ_READ;
list_add_tail(&(req->uc_chain), &vcp->vc_processing);
goto out;
}
@@ -315,19 +315,19 @@ static int coda_psdev_release(struct ino
list_del(&req->uc_chain);
/* Async requests need to be freed here */
- if (req->uc_flags & REQ_ASYNC) {
+ if (req->uc_flags & CODA_REQ_ASYNC) {
CODA_FREE(req->uc_data, sizeof(struct coda_in_hdr));
kfree(req);
continue;
}
- req->uc_flags |= REQ_ABORT;
+ req->uc_flags |= CODA_REQ_ABORT;
wake_up(&req->uc_sleep);
}
list_for_each_entry_safe(req, tmp, &vcp->vc_processing, uc_chain) {
list_del(&req->uc_chain);
- req->uc_flags |= REQ_ABORT;
+ req->uc_flags |= CODA_REQ_ABORT;
wake_up(&req->uc_sleep);
}
Index: work/fs/coda/upcall.c
===================================================================
--- work.orig/fs/coda/upcall.c
+++ work/fs/coda/upcall.c
@@ -604,7 +604,7 @@ static void coda_unblock_signals(sigset_
(((r)->uc_opcode != CODA_CLOSE && \
(r)->uc_opcode != CODA_STORE && \
(r)->uc_opcode != CODA_RELEASE) || \
- (r)->uc_flags & REQ_READ))
+ (r)->uc_flags & CODA_REQ_READ))
static inline void coda_waitfor_upcall(struct upc_req *req)
{
@@ -624,7 +624,7 @@ static inline void coda_waitfor_upcall(s
set_current_state(TASK_UNINTERRUPTIBLE);
/* got a reply */
- if (req->uc_flags & (REQ_WRITE | REQ_ABORT))
+ if (req->uc_flags & (CODA_REQ_WRITE | CODA_REQ_ABORT))
break;
if (blocked && time_after(jiffies, timeout) &&
@@ -708,7 +708,7 @@ static int coda_upcall(struct venus_comm
coda_waitfor_upcall(req);
/* Op went through, interrupt or not... */
- if (req->uc_flags & REQ_WRITE) {
+ if (req->uc_flags & CODA_REQ_WRITE) {
out = (union outputArgs *)req->uc_data;
/* here we map positive Venus errors to kernel errors */
error = -out->oh.result;
@@ -717,13 +717,13 @@ static int coda_upcall(struct venus_comm
}
error = -EINTR;
- if ((req->uc_flags & REQ_ABORT) || !signal_pending(current)) {
+ if ((req->uc_flags & CODA_REQ_ABORT) || !signal_pending(current)) {
printk(KERN_WARNING "coda: Unexpected interruption.\n");
goto exit;
}
/* Interrupted before venus read it. */
- if (!(req->uc_flags & REQ_READ))
+ if (!(req->uc_flags & CODA_REQ_READ))
goto exit;
/* Venus saw the upcall, make sure we can send interrupt signal */
@@ -747,7 +747,7 @@ static int coda_upcall(struct venus_comm
sig_inputArgs->ih.opcode = CODA_SIGNAL;
sig_inputArgs->ih.unique = req->uc_unique;
- sig_req->uc_flags = REQ_ASYNC;
+ sig_req->uc_flags = CODA_REQ_ASYNC;
sig_req->uc_opcode = sig_inputArgs->ih.opcode;
sig_req->uc_unique = sig_inputArgs->ih.unique;
sig_req->uc_inSize = sizeof(struct coda_in_hdr);
Index: work/include/linux/coda_psdev.h
===================================================================
--- work.orig/include/linux/coda_psdev.h
+++ work/include/linux/coda_psdev.h
@@ -86,9 +86,9 @@ struct upc_req {
wait_queue_head_t uc_sleep; /* process' wait queue */
};
-#define REQ_ASYNC 0x1
-#define REQ_READ 0x2
-#define REQ_WRITE 0x4
-#define REQ_ABORT 0x8
+#define CODA_REQ_ASYNC 0x1
+#define CODA_REQ_READ 0x2
+#define CODA_REQ_WRITE 0x4
+#define CODA_REQ_ABORT 0x8
#endif
|
|
From: Jan H. <jah...@cs...> - 2010-08-03 16:50:00
|
On Tue, Aug 03, 2010 at 06:02:53PM +0200, Tejun Heo wrote: > REQ_* constants are used for block layer requests causing inconsistent > duplicate definitions of REQ_WRITE. Rename REQ_* used by coda to > CODA_REQ_*. > > Signed-off-by: Tejun Heo <tj...@ke...> Looks good to me. Acked-by: Jan Harkes <jah...@cs...> |
|
From: Jens A. <ja...@fu...> - 2010-08-03 17:32:01
|
On 2010-08-03 18:02, Tejun Heo wrote: > REQ_* constants are used for block layer requests causing inconsistent > duplicate definitions of REQ_WRITE. Rename REQ_* used by coda to > CODA_REQ_*. Should have been clear, I committed the fixup already: http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=ceb1fde0d5c7611fdb9004176ac34140d27e745a Looks byte-for-byte identical :-) -- Jens Axboe Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited. |
|
From: Jeff M. <jm...@re...> - 2010-08-05 18:46:16
|
Tejun Heo <tj...@ke...> writes: > Commit a82afdf (block: use the same failfast bits for bio and request) > moved BIO_RW_* bits around such that they match up with REQ_* bits. > Unfortunately, fs.h hard coded READ, WRITE, READA and SWRITE as 0, 1, > 2 and 3, and expected them to match with BIO_RW_* bits. READ/WRITE > didn't change but BIO_RW_AHEAD was moved to bit 4 instead of bit 1, > breaking READA and SWRITE. > > This patch updates READA and SWRITE such that they match the BIO_RW_* > bits again. A follow up patch will update the definitions to directly > use BIO_RW_* bits so that this kind of breakage won't happen again. > > Stable: The offending commit a82afdf was released with v2.6.32, so > this patch should be applied to all kernels since then but it must > _NOT_ be applied to kernels earlier than that. Would someone be so kind as to remind me how this problem manifests itself? I know I read this recently, but my memory and googling skills are both failing me. :( Cheers, Jeff |