From: Richard W. <ri...@no...> - 2010-09-28 21:47:54
|
Under high load the file system gets corrupted. This patch fixes the issue. Many thanks to Janjaap Bos <ja...@bo...>! LKML-Reference: <AANLkTi=PTp7YW_eYxtF-H2QSxgei3whWH59wU0C9oCkz () mail ! gmail ! com> Signed-off-by: Richard Weinberger <ri...@no...> --- arch/um/drivers/ubd_kern.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index 1bcd208..2874b83 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -748,9 +748,12 @@ static int ubd_open_dev(struct ubd *ubd_dev) } ubd_dev->fd = fd; - if(ubd_dev->cow.file != NULL){ - blk_queue_max_hw_sectors(ubd_dev->queue, 8 * sizeof(long)); + /* A setting higher than 1 sector currently (>= v2.6.31) generates + data loss, both for raw and cow ubd. */ + blk_queue_max_hw_sectors(ubd_dev->queue, 1 * sizeof(long)); + blk_queue_max_segments(ubd_dev->queue, 1 * sizeof(long)); + if (ubd_dev->cow.file != NULL) { err = -ENOMEM; ubd_dev->cow.bitmap = vmalloc(ubd_dev->cow.bitmap_len); if(ubd_dev->cow.bitmap == NULL){ -- 1.6.6.1 |
From: Andrew M. <ak...@li...> - 2010-09-28 22:00:46
|
On Tue, 28 Sep 2010 23:47:36 +0200 Richard Weinberger <ri...@no...> wrote: > Under high load the file system gets corrupted. > This patch fixes the issue. > > Many thanks to Janjaap Bos <ja...@bo...>! > > LKML-Reference: <AANLkTi=PTp7YW_eYxtF-H2QSxgei3whWH59wU0C9oCkz () mail ! gmail ! com> > Signed-off-by: Richard Weinberger <ri...@no...> > --- > arch/um/drivers/ubd_kern.c | 7 +++++-- > 1 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c > index 1bcd208..2874b83 100644 > --- a/arch/um/drivers/ubd_kern.c > +++ b/arch/um/drivers/ubd_kern.c > @@ -748,9 +748,12 @@ static int ubd_open_dev(struct ubd *ubd_dev) > } > ubd_dev->fd = fd; > > - if(ubd_dev->cow.file != NULL){ > - blk_queue_max_hw_sectors(ubd_dev->queue, 8 * sizeof(long)); > + /* A setting higher than 1 sector currently (>= v2.6.31) generates > + data loss, both for raw and cow ubd. */ > + blk_queue_max_hw_sectors(ubd_dev->queue, 1 * sizeof(long)); > + blk_queue_max_segments(ubd_dev->queue, 1 * sizeof(long)); > > + if (ubd_dev->cow.file != NULL) { > err = -ENOMEM; > ubd_dev->cow.bitmap = vmalloc(ubd_dev->cow.bitmap_len); > if(ubd_dev->cow.bitmap == NULL){ This is a workaround, I think? Do we know what the actual bug is? >From the comment it appears to be a regression? |
From: Tejun H. <ht...@gm...> - 2010-10-05 08:21:24
|
On 10/04/2010 09:51 PM, Chris Frey wrote: > On Mon, Oct 04, 2010 at 06:37:36PM +0200, Tejun Heo wrote: >> Hello, sorry about chiming in later. I was off last week. > > No problem, I'm eager to test patches to fix this. > >> I think we're on the right track. The problem with Jens' patch was >> that it didn't consider the fact that blk_end_request() now internally >> updates the current position of the request, so if restart happens >> after some of part of the request is complete it will end up adding >> the offsets multiple times. I think slightly modifying it to track >> the current position instead of offset should do it. Chris, can you >> please try the following patch and see whether the problem goes away? > > Unfortunately, this patch does not fix it for me. I applied your patch > to kernel 2.6.35.5, and got the usual error: > > EXT3-fs error (device ubda): ext3_lookup: deleted inode referenced: 566188 > > Lots of them, actually. Hmmmm, can you please give a shot at the following one? Thank you. diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index 1bcd208..0435d21 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -163,6 +163,7 @@ struct ubd { struct scatterlist sg[MAX_SG]; struct request *request; int start_sg, end_sg; + sector_t rq_pos; }; #define DEFAULT_COW { \ @@ -187,6 +188,7 @@ struct ubd { .request = NULL, \ .start_sg = 0, \ .end_sg = 0, \ + .rq_pos = 0, \ } /* Protected by ubd_lock */ @@ -1228,7 +1230,6 @@ static void do_ubd_request(struct request_queue *q) { struct io_thread_req *io_req; struct request *req; - sector_t sector; int n; while(1){ @@ -1239,12 +1240,12 @@ static void do_ubd_request(struct request_queue *q) return; dev->request = req; + dev->rq_pos = blk_rq_pos(req); dev->start_sg = 0; dev->end_sg = blk_rq_map_sg(q, req, dev->sg); } req = dev->request; - sector = blk_rq_pos(req); while(dev->start_sg < dev->end_sg){ struct scatterlist *sg = &dev->sg[dev->start_sg]; @@ -1256,10 +1257,10 @@ static void do_ubd_request(struct request_queue *q) return; } prepare_request(req, io_req, - (unsigned long long)sector << 9, + (unsigned long long)dev->rq_pos << 9, sg->offset, sg->length, sg_page(sg)); - sector += sg->length >> 9; + dev->rq_pos += sg->length >> 9; n = os_write_file(thread_fd, &io_req, sizeof(struct io_thread_req *)); if(n != sizeof(struct io_thread_req *)){ |
From: Chris F. <cd...@fo...> - 2010-10-05 20:32:41
|
On Tue, Oct 05, 2010 at 10:23:19AM +0200, Tejun Heo wrote: > Hmmmm, can you please give a shot at the following one? Thank you. I applied this patch on top of stock 2.6.35.5 as usual (no other patches) and tested on my maverick image as before. I ran a fsck.ext3 on the filesystem image from the host before my test, just to make sure, and there were no errors. Unfortunately, this patch does not fix the issue either. I get errors in the guest like the following: EXT3-fs error (device ubda): htree_dirblock_to_tree: bad entry in directory #1137497: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0 EXT3-fs (ubda): warning: empty_dir: bad directory (dir #1137497) - no `.' or `..' EXT3-fs error (device ubda): htree_dirblock_to_tree: bad entry in directory #1137587: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0 EXT3-fs (ubda): warning: empty_dir: bad directory (dir #1137587) - no `.' or `..' EXT3-fs error (device ubda): htree_dirblock_to_tree: bad entry in directory #1137619: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0 EXT3-fs (ubda): warning: empty_dir: bad directory (dir #1137619) - no `.' or `..' EXT3-fs error (device ubda): htree_dirblock_to_tree: bad entry in directory #1137532: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0 EXT3-fs (ubda): warning: empty_dir: bad directory (dir #1137532) - no `.' or `..' EXT3-fs (ubda): warning: ext3_rmdir: empty directory has nlink!=2 (6) EXT3-fs (ubda): warning: ext3_rmdir: empty directory has nlink!=2 (3) And later on: EXT3-fs error (device ubda): ext3_lookup: deleted inode referenced: 867196 EXT3-fs error (device ubda): ext3_lookup: deleted inode referenced: 484932 EXT3-fs error (device ubda): ext3_lookup: deleted inode referenced: 484932 - Chris |
From: Richard W. <ri...@no...> - 2010-09-28 22:13:21
|
Am Mittwoch 29 September 2010, 00:00:00 schrieb Andrew Morton: > This is a workaround, I think? Do we know what the actual bug is? > From the comment it appears to be a regression? Yes, it is a workaround. For more details please have a look at this post: http://lkml.org/lkml/2010/9/28/245 Cheers, //richard |
From: Chris F. <cd...@fo...> - 2010-09-29 03:32:17
|
On Wed, Sep 29, 2010 at 12:13:10AM +0200, Richard Weinberger wrote: > Am Mittwoch 29 September 2010, 00:00:00 schrieb Andrew Morton: > > This is a workaround, I think? Do we know what the actual bug is? > > From the comment it appears to be a regression? > > Yes, it is a workaround. > For more details please have a look at this post: > http://lkml.org/lkml/2010/9/28/245 I am so far unable to reproduce the corruption with this workaround patch. So it looks like we have at least one fix. Thank you very much to everyone who worked on this. - Chris |
From: Chris F. <cd...@fo...> - 2010-09-28 22:53:41
|
On Wed, Sep 29, 2010 at 12:13:10AM +0200, Richard Weinberger wrote: > Am Mittwoch 29 September 2010, 00:00:00 schrieb Andrew Morton: > > This is a workaround, I think? Do we know what the actual bug is? > > From the comment it appears to be a regression? > > Yes, it is a workaround. > For more details please have a look at this post: > http://lkml.org/lkml/2010/9/28/245 I'm going to run some tests to see how this works on my end, but in reading old threads, I found this response from Bram Matthys that casts some doubt on an older version of this patch. http://marc.info/?l=user-mode-linux-user&m=126883932731656&w=2 I've added him to the CC. - Chris |
From: Jens A. <ja...@fu...> - 2010-10-07 08:17:47
|
On 2010-10-05 22:31, Chris Frey wrote: > On Tue, Oct 05, 2010 at 10:23:19AM +0200, Tejun Heo wrote: >> Hmmmm, can you please give a shot at the following one? Thank you. > > I applied this patch on top of stock 2.6.35.5 as usual (no other patches) > and tested on my maverick image as before. I ran a fsck.ext3 on the > filesystem image from the host before my test, just to make sure, and > there were no errors. > > Unfortunately, this patch does not fix the issue either. I get errors > in the guest like the following: So how about this? Note that I haven't even compiled this. The request handling logic really should be fixed in there, it's horribly inefficient. diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index 1bcd208..c272cf6 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -163,6 +163,7 @@ struct ubd { struct scatterlist sg[MAX_SG]; struct request *request; int start_sg, end_sg; + unsigned int rq_offset; }; #define DEFAULT_COW { \ @@ -187,6 +188,7 @@ struct ubd { .request = NULL, \ .start_sg = 0, \ .end_sg = 0, \ + .rq_offset 0, \ } /* Protected by ubd_lock */ @@ -466,6 +468,8 @@ static void ubd_handler(void) int n; while(1){ + struct ubd *dev; + n = os_read_file(thread_fd, &req, sizeof(struct io_thread_req *)); if(n != sizeof(req)){ @@ -476,6 +480,11 @@ static void ubd_handler(void) return; } + /* + * Clear internal offset, block core will update request state + */ + dev = req->req->q->queuedata; + dev->rq_offset = 0; blk_end_request(req->req, 0, req->length); kfree(req); } @@ -1241,10 +1250,11 @@ static void do_ubd_request(struct request_queue *q) dev->request = req; dev->start_sg = 0; dev->end_sg = blk_rq_map_sg(q, req, dev->sg); + dev->rq_offset = 0; } req = dev->request; - sector = blk_rq_pos(req); + sector = blk_rq_pos(req) + dev->rq_offset; while(dev->start_sg < dev->end_sg){ struct scatterlist *sg = &dev->sg[dev->start_sg]; @@ -1272,6 +1282,7 @@ static void do_ubd_request(struct request_queue *q) return; } + dev->rq_offset += sg->length >> 9; dev->start_sg++; } dev->end_sg = 0; -- Jens Axboe |
From: Chris F. <cd...@fo...> - 2010-10-07 20:25:22
|
On Thu, Oct 07, 2010 at 09:58:16AM +0200, Jens Axboe wrote: > So how about this? Note that I haven't even compiled this. The request > handling logic really should be fixed in there, it's horribly > inefficient. Thanks. I fixed the compile error with: > + .rq_offset 0, \ to > + .rq_offset = 0, \ But after testing, I still got the same errors: EXT3-fs error (device ubda): ext3_lookup: deleted inode referenced: 964816 EXT3-fs error (device ubda): ext3_lookup: deleted inode referenced: 964862 EXT3-fs error (device ubda): ext3_lookup: deleted inode referenced: 964786 EXT3-fs error (device ubda): ext3_lookup: deleted inode referenced: 964815 ... - Chris |
From: richard -r. w. <ric...@gm...> - 2010-10-14 14:20:18
|
On Thu, Oct 14, 2010 at 3:14 PM, Tejun Heo <ht...@gm...> wrote: > Hello, > > Can you please try this one then? It seems to work here but I can't > reproduce the original problem reliably so I'm not really sure. > > Thanks. > > diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c > index 1bcd208..9734994 100644 > --- a/arch/um/drivers/ubd_kern.c > +++ b/arch/um/drivers/ubd_kern.c > @@ -163,6 +163,7 @@ struct ubd { > struct scatterlist sg[MAX_SG]; > struct request *request; > int start_sg, end_sg; > + sector_t rq_pos; > }; > > #define DEFAULT_COW { \ > @@ -187,6 +188,7 @@ struct ubd { > .request = NULL, \ > .start_sg = 0, \ > .end_sg = 0, \ > + .rq_pos = 0, \ > } > > /* Protected by ubd_lock */ > @@ -1228,7 +1230,6 @@ static void do_ubd_request(struct request_queue *q) > { > struct io_thread_req *io_req; > struct request *req; > - sector_t sector; > int n; > > while(1){ > @@ -1239,12 +1240,12 @@ static void do_ubd_request(struct request_queue *q) > return; > > dev->request = req; > + dev->rq_pos = blk_rq_pos(req); > dev->start_sg = 0; > dev->end_sg = blk_rq_map_sg(q, req, dev->sg); > } > > req = dev->request; > - sector = blk_rq_pos(req); > while(dev->start_sg < dev->end_sg){ > struct scatterlist *sg = &dev->sg[dev->start_sg]; > > @@ -1256,10 +1257,9 @@ static void do_ubd_request(struct request_queue *q) > return; > } > prepare_request(req, io_req, > - (unsigned long long)sector << 9, > + (unsigned long long)dev->rq_pos << 9, > sg->offset, sg->length, sg_page(sg)); > > - sector += sg->length >> 9; > n = os_write_file(thread_fd, &io_req, > sizeof(struct io_thread_req *)); > if(n != sizeof(struct io_thread_req *)){ > @@ -1272,6 +1272,7 @@ static void do_ubd_request(struct request_queue *q) > return; > } > > + dev->rq_pos += sg->length >> 9; > dev->start_sg++; > } > dev->end_sg = 0; It does not work for me. But the error is a different one. :-) Without your patch I've never got this kernel trace. [ 59.850000] kworker/0:1: page allocation failure. order:0, mode:0x20 [ 59.850000] 0fe22d68: [<081ca1da>] dump_stack+0x1c/0x20 [ 59.850000] 0fe22d80: [<080a4981>] __alloc_pages_nodemask+0x4a0/0x4cb [ 59.850000] 0fe22e00: [<080bd2d7>] cache_alloc_refill+0x213/0x3fe [ 59.850000] 0fe22e58: [<080bd5db>] kmem_cache_alloc+0x62/0x8f [ 59.850000] 0fe22e74: [<08062a4c>] do_ubd_request+0x91/0x40c [ 59.850000] 0fe22edc: [<08148f01>] __blk_run_queue+0x40/0x6e [ 59.850000] 0fe22ef0: [<08151280>] cfq_kick_queue+0x24/0x39 [ 59.850000] 0fe22f08: [<08083c86>] process_one_work+0x1ea/0x2f0 [ 59.850000] 0fe22f44: [<08084202>] worker_thread+0x174/0x2e4 [ 59.850000] 0fe22f74: [<08087e5b>] kthread+0x78/0x7f [ 59.850000] 0fe22fb4: [<08066117>] run_kernel_thread+0x37/0x3f [ 59.850000] 0fe22fe0: [<08059bd1>] new_thread_handler+0x60/0x87 [ 59.850000] 0fe22ffc: [<00000000>] 0x0 [ 59.850000] [ 59.850000] Mem-Info: [ 59.850000] Normal per-cpu: [ 59.850000] CPU 0: hi: 42, btch: 7 usd: 15 [ 59.850000] active_anon:1146 inactive_anon:1194 isolated_anon:0 [ 59.850000] active_file:4314 inactive_file:19869 isolated_file:0 [ 59.850000] unevictable:0 dirty:1327 writeback:1837 unstable:0 [ 59.850000] free:131 slab_reclaimable:3618 slab_unreclaimable:1141 [ 59.850000] mapped:1105 shmem:10 pagetables:82 bounce:0 [ 59.850000] Normal free:524kB min:1440kB low:1800kB high:2160kB active_anon:4584kB inactive_anon:4776kB active_file:17256kB inactive_file:79476kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:130048kB mlocked:0kB dirty:5308kB writeback:7348kB mapped:4420kB shmem:40kB slab_reclaimable:14472kB slab_unreclaimable:4564kB kernel_stack:164kB pagetables:328kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 59.850000] lowmem_reserve[]: 0 0 [ 59.850000] Normal: 1*4kB 3*8kB 25*16kB 3*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524kB [ 59.850000] 24193 total pagecache pages [ 59.850000] 0 pages in swap cache [ 59.850000] Swap cache stats: add 0, delete 0, find 0/0 [ 59.850000] Free swap = 0kB [ 59.850000] Total swap = 0kB [ 59.850000] 32768 pages RAM [ 59.850000] 1132 pages reserved [ 59.850000] 26071 pages shared [ 59.850000] 8225 pages non-shared [ 94.930000] kworker/0:1: page allocation failure. order:0, mode:0x20 [ 94.930000] 0fe22d68: [<081ca1da>] dump_stack+0x1c/0x20 [ 94.930000] 0fe22d80: [<080a4981>] __alloc_pages_nodemask+0x4a0/0x4cb [ 94.930000] 0fe22e00: [<080bd2d7>] cache_alloc_refill+0x213/0x3fe [ 94.930000] 0fe22e58: [<080bd5db>] kmem_cache_alloc+0x62/0x8f [ 94.930000] 0fe22e74: [<08062a4c>] do_ubd_request+0x91/0x40c [ 94.930000] 0fe22edc: [<08148f01>] __blk_run_queue+0x40/0x6e [ 94.930000] 0fe22ef0: [<08151280>] cfq_kick_queue+0x24/0x39 [ 94.930000] 0fe22f08: [<08083c86>] process_one_work+0x1ea/0x2f0 [ 94.930000] 0fe22f44: [<08084202>] worker_thread+0x174/0x2e4 [ 94.930000] 0fe22f74: [<08087e5b>] kthread+0x78/0x7f [ 94.930000] 0fe22fb4: [<08066117>] run_kernel_thread+0x37/0x3f [ 94.930000] 0fe22fe0: [<08059bd1>] new_thread_handler+0x60/0x87 [ 94.930000] 0fe22ffc: [<00000000>] 0x0 [ 94.930000] [ 94.930000] Mem-Info: [ 94.930000] Normal per-cpu: [ 94.930000] CPU 0: hi: 42, btch: 7 usd: 7 [ 94.930000] active_anon:1202 inactive_anon:1274 isolated_anon:0 [ 94.930000] active_file:6904 inactive_file:17913 isolated_file:0 [ 94.930000] unevictable:0 dirty:1341 writeback:2209 unstable:0 [ 94.930000] free:132 slab_reclaimable:2912 slab_unreclaimable:1077 [ 94.930000] mapped:1173 shmem:10 pagetables:86 bounce:0 [ 94.930000] Normal free:528kB min:1440kB low:1800kB high:2160kB active_anon:4808kB inactive_anon:5096kB active_file:27616kB inactive_file:71652kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:130048kB mlocked:0kB dirty:5364kB writeback:8836kB mapped:4692kB shmem:40kB slab_reclaimable:11648kB slab_unreclaimable:4308kB kernel_stack:168kB pagetables:344kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 94.930000] lowmem_reserve[]: 0 0 [ 94.930000] Normal: 0*4kB 0*8kB 17*16kB 8*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 528kB [ 94.930000] 24827 total pagecache pages [ 94.930000] 0 pages in swap cache [ 94.930000] Swap cache stats: add 0, delete 0, find 0/0 [ 94.930000] Free swap = 0kB [ 94.930000] Total swap = 0kB [ 94.930000] 32768 pages RAM [ 94.930000] 1132 pages reserved [ 94.930000] 26123 pages shared [ 94.930000] 8586 pages non-shared [ 94.930000] kworker/0:1: page allocation failure. order:0, mode:0x20 [ 94.930000] 0fe22c60: [<081ca1da>] dump_stack+0x1c/0x20 [ 94.930000] 0fe22c78: [<080a4981>] __alloc_pages_nodemask+0x4a0/0x4cb [ 94.930000] 0fe22cf8: [<080bd2d7>] cache_alloc_refill+0x213/0x3fe [ 94.930000] 0fe22d50: [<080bd5db>] kmem_cache_alloc+0x62/0x8f [ 94.930000] 0fe22d6c: [<08062a4c>] do_ubd_request+0x91/0x40c [ 94.930000] 0fe22dd4: [<08062e5f>] ubd_intr+0x98/0xbf [ 94.930000] 0fe22df8: [<0809bed1>] handle_IRQ_event+0x15/0x9b [ 94.930000] 0fe22e14: [<0809bfbd>] __do_IRQ+0x66/0xb8 [ 94.930000] 0fe22e34: [<08059136>] do_IRQ+0x1f/0x34 [ 94.930000] 0fe22e44: [<080592e6>] sigio_handler+0x46/0x5c [ 94.930000] 0fe22e5c: [<08066e6e>] sig_handler_common+0x61/0x70 [ 94.930000] 0fe22ed4: [<08066ec3>] unblock_signals+0x46/0x57 [ 94.930000] 0fe22ee0: [<081cc3c7>] _raw_spin_unlock_irq+0x10/0x13 [ 94.930000] 0fe22eec: [<0815128b>] cfq_kick_queue+0x2f/0x39 [ 94.930000] 0fe22f08: [<08083c86>] process_one_work+0x1ea/0x2f0 [ 94.930000] 0fe22f44: [<08084202>] worker_thread+0x174/0x2e4 [ 94.930000] 0fe22f74: [<08087e5b>] kthread+0x78/0x7f [ 94.930000] 0fe22fb4: [<08066117>] run_kernel_thread+0x37/0x3f [ 94.930000] 0fe22fe0: [<08059bd1>] new_thread_handler+0x60/0x87 [ 94.930000] 0fe22ffc: [<00000000>] 0x0 [ 94.930000] [ 94.930000] Mem-Info: [ 94.930000] Normal per-cpu: [ 94.930000] CPU 0: hi: 42, btch: 7 usd: 18 [ 94.930000] active_anon:1202 inactive_anon:1274 isolated_anon:0 [ 94.930000] active_file:6904 inactive_file:17913 isolated_file:0 [ 94.930000] unevictable:0 dirty:1341 writeback:1811 unstable:0 [ 94.930000] free:132 slab_reclaimable:2912 slab_unreclaimable:1066 [ 94.930000] mapped:1173 shmem:10 pagetables:86 bounce:0 [ 94.930000] Normal free:528kB min:1440kB low:1800kB high:2160kB active_anon:4808kB inactive_anon:5096kB active_file:27616kB inactive_file:71652kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:130048kB mlocked:0kB dirty:5364kB writeback:7244kB mapped:4692kB shmem:40kB slab_reclaimable:11648kB slab_unreclaimable:4264kB kernel_stack:168kB pagetables:344kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 94.930000] lowmem_reserve[]: 0 0 [ 94.930000] Normal: 0*4kB 0*8kB 17*16kB 8*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 528kB [ 94.930000] 24827 total pagecache pages [ 94.930000] 0 pages in swap cache [ 94.930000] Swap cache stats: add 0, delete 0, find 0/0 [ 94.930000] Free swap = 0kB [ 94.930000] Total swap = 0kB [ 94.930000] 32768 pages RAM [ 94.930000] 1132 pages reserved [ 94.930000] 26123 pages shared [ 94.930000] 8575 pages non-shared -- Thanks, //richard |
From: Tejun H. <ht...@gm...> - 2010-10-14 18:01:30
|
Hello, On 10/14/2010 04:20 PM, richard -rw- weinberger wrote: > It does not work for me. > But the error is a different one. :-) > Without your patch I've never got this kernel trace. > > [ 59.850000] kworker/0:1: page allocation failure. order:0, mode:0x20 Hmm... you're seeing out of memory condition. If the code screws up filesystem access, I suppose it could make that happen too but can you please check the configuration just in case (especially the memory size)? Also, if you can reliably reproduce the filesystem corruption w/o the patch, can you please tell me how to do it? Thanks. -- tejun |
From: Richard W. <ri...@no...> - 2010-10-14 21:24:50
|
Hi! Am Donnerstag 14 Oktober 2010, 20:03:40 schrieb Tejun Heo: > Hello, > > On 10/14/2010 04:20 PM, richard -rw- weinberger wrote: > > It does not work for me. > > But the error is a different one. :-) > > Without your patch I've never got this kernel trace. > > > > [ 59.850000] kworker/0:1: page allocation failure. order:0, mode:0x20 > > Hmm... you're seeing out of memory condition. If the code screws up > filesystem access, I suppose it could make that happen too but can you > please check the configuration just in case (especially the memory > size)? Also, if you can reliably reproduce the filesystem corruption > w/o the patch, can you please tell me how to do it? > After rerunning my test case a few times with exactly the same configuration I was unable to reproduce neither a filesystem corruption nor a allocation failure. Your patch seems to fix the issue. Sorry for the false negative! BTW: This is my test case: tar xvf linux-2.6.36-rc7.tar.bz2 && rm linux-2.6.36-rc7.tar.bz2 && tar cvf linux-2.6.36-rc7.tar.bz2 linux-2.6.36-rc7/ && rm -rf linux-2.6.36-rc7/ Any big tar archive will do the job... Thanks, //richard |
From: Jens A. <ja...@fu...> - 2010-09-28 23:41:05
|
On 2010-09-29 07:52, Chris Frey wrote: > On Wed, Sep 29, 2010 at 12:13:10AM +0200, Richard Weinberger wrote: >> Am Mittwoch 29 September 2010, 00:00:00 schrieb Andrew Morton: >>> This is a workaround, I think? Do we know what the actual bug is? >>> From the comment it appears to be a regression? >> >> Yes, it is a workaround. >> For more details please have a look at this post: >> http://lkml.org/lkml/2010/9/28/245 > > I'm going to run some tests to see how this works on my end, but in > reading old threads, I found this response from Bram Matthys > that casts some doubt on an older version of this patch. > > http://marc.info/?l=user-mode-linux-user&m=126883932731656&w=2 > > I've added him to the CC. It looks like that if we need to restart the requeue, then we use the initial position and not the current index. Does this help? diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index 1bcd208..81ee063 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -162,7 +162,7 @@ struct ubd { spinlock_t lock; struct scatterlist sg[MAX_SG]; struct request *request; - int start_sg, end_sg; + int start_sg, end_sg, rq_off; }; #define DEFAULT_COW { \ @@ -187,6 +187,7 @@ struct ubd { .request = NULL, \ .start_sg = 0, \ .end_sg = 0, \ + .rq_off = 0, \ } /* Protected by ubd_lock */ @@ -1241,10 +1242,11 @@ static void do_ubd_request(struct request_queue *q) dev->request = req; dev->start_sg = 0; dev->end_sg = blk_rq_map_sg(q, req, dev->sg); + dev->rq_off = 0; } req = dev->request; - sector = blk_rq_pos(req); + sector = blk_rq_pos(req) + dev->rq_off; while(dev->start_sg < dev->end_sg){ struct scatterlist *sg = &dev->sg[dev->start_sg]; @@ -1273,6 +1275,7 @@ static void do_ubd_request(struct request_queue *q) } dev->start_sg++; + dev->rq_off += sg->length >> 9; } dev->end_sg = 0; dev->request = NULL; -- Jens Axboe |
From: Janjaap B. <ja...@bo...> - 2010-09-29 00:48:41
|
On Wed, 2010-09-29 at 08:10 +0900, Jens Axboe wrote: > It looks like that if we need to restart the requeue, then > we use the initial position and not the current index. Does > this help? > > diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c > index 1bcd208..81ee063 100644 > --- a/arch/um/drivers/ubd_kern.c > +++ b/arch/um/drivers/ubd_kern.c > @@ -162,7 +162,7 @@ struct ubd { > spinlock_t lock; > struct scatterlist sg[MAX_SG]; > struct request *request; > - int start_sg, end_sg; > + int start_sg, end_sg, rq_off; > }; > > #define DEFAULT_COW { \ > @@ -187,6 +187,7 @@ struct ubd { > .request = NULL, \ > .start_sg = 0, \ > .end_sg = 0, \ > + .rq_off = 0, \ > } > > /* Protected by ubd_lock */ > @@ -1241,10 +1242,11 @@ static void do_ubd_request(struct request_queue *q) > dev->request = req; > dev->start_sg = 0; > dev->end_sg = blk_rq_map_sg(q, req, dev->sg); > + dev->rq_off = 0; > } > > req = dev->request; > - sector = blk_rq_pos(req); > + sector = blk_rq_pos(req) + dev->rq_off; > while(dev->start_sg < dev->end_sg){ > struct scatterlist *sg = &dev->sg[dev->start_sg]; > > @@ -1273,6 +1275,7 @@ static void do_ubd_request(struct request_queue *q) > } > > dev->start_sg++; > + dev->rq_off += sg->length >> 9; > } > dev->end_sg = 0; > dev->request = NULL; > Yes, this works. I quickly tested it on current git version, and it does not require the work-around. So far no corruption seen. Thanks! -Janjaap |
From: Chris F. <cd...@fo...> - 2010-09-29 01:30:32
|
On Wed, Sep 29, 2010 at 08:10:06AM +0900, Jens Axboe wrote: > It looks like that if we need to restart the requeue, then > we use the initial position and not the current index. Does > this help? > > diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c > index 1bcd208..81ee063 100644 > --- a/arch/um/drivers/ubd_kern.c > +++ b/arch/um/drivers/ubd_kern.c > @@ -162,7 +162,7 @@ struct ubd { > spinlock_t lock; > struct scatterlist sg[MAX_SG]; > struct request *request; > - int start_sg, end_sg; > + int start_sg, end_sg, rq_off; > }; > > #define DEFAULT_COW { \ > @@ -187,6 +187,7 @@ struct ubd { > .request = NULL, \ > .start_sg = 0, \ > .end_sg = 0, \ > + .rq_off = 0, \ > } > > /* Protected by ubd_lock */ > @@ -1241,10 +1242,11 @@ static void do_ubd_request(struct request_queue *q) > dev->request = req; > dev->start_sg = 0; > dev->end_sg = blk_rq_map_sg(q, req, dev->sg); > + dev->rq_off = 0; > } > > req = dev->request; > - sector = blk_rq_pos(req); > + sector = blk_rq_pos(req) + dev->rq_off; > while(dev->start_sg < dev->end_sg){ > struct scatterlist *sg = &dev->sg[dev->start_sg]; > > @@ -1273,6 +1275,7 @@ static void do_ubd_request(struct request_queue *q) > } > > dev->start_sg++; > + dev->rq_off += sg->length >> 9; > } > dev->end_sg = 0; > dev->request = NULL; > > -- This patch does not fix the corruption issue for me. I applied the patch to 2.6.35.5, and reproduced the "deleted inode referenced" errors in both a gentoo and ubuntu guest OS. It does take longer to reproduce though, with this patch. - Chris |
From: Tejun H. <ht...@gm...> - 2010-10-14 13:14:44
|
Hello, Can you please try this one then? It seems to work here but I can't reproduce the original problem reliably so I'm not really sure. Thanks. diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index 1bcd208..9734994 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -163,6 +163,7 @@ struct ubd { struct scatterlist sg[MAX_SG]; struct request *request; int start_sg, end_sg; + sector_t rq_pos; }; #define DEFAULT_COW { \ @@ -187,6 +188,7 @@ struct ubd { .request = NULL, \ .start_sg = 0, \ .end_sg = 0, \ + .rq_pos = 0, \ } /* Protected by ubd_lock */ @@ -1228,7 +1230,6 @@ static void do_ubd_request(struct request_queue *q) { struct io_thread_req *io_req; struct request *req; - sector_t sector; int n; while(1){ @@ -1239,12 +1240,12 @@ static void do_ubd_request(struct request_queue *q) return; dev->request = req; + dev->rq_pos = blk_rq_pos(req); dev->start_sg = 0; dev->end_sg = blk_rq_map_sg(q, req, dev->sg); } req = dev->request; - sector = blk_rq_pos(req); while(dev->start_sg < dev->end_sg){ struct scatterlist *sg = &dev->sg[dev->start_sg]; @@ -1256,10 +1257,9 @@ static void do_ubd_request(struct request_queue *q) return; } prepare_request(req, io_req, - (unsigned long long)sector << 9, + (unsigned long long)dev->rq_pos << 9, sg->offset, sg->length, sg_page(sg)); - sector += sg->length >> 9; n = os_write_file(thread_fd, &io_req, sizeof(struct io_thread_req *)); if(n != sizeof(struct io_thread_req *)){ @@ -1272,6 +1272,7 @@ static void do_ubd_request(struct request_queue *q) return; } + dev->rq_pos += sg->length >> 9; dev->start_sg++; } dev->end_sg = 0; |
From: Chris F. <cd...@fo...> - 2010-10-15 04:48:59
|
On Thu, Oct 14, 2010 at 03:14:28PM +0200, Tejun Heo wrote: > Hello, > > Can you please try this one then? It seems to work here but I can't > reproduce the original problem reliably so I'm not really sure. > > Thanks. > > diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c > index 1bcd208..9734994 100644 > --- a/arch/um/drivers/ubd_kern.c > +++ b/arch/um/drivers/ubd_kern.c > @@ -163,6 +163,7 @@ struct ubd { > struct scatterlist sg[MAX_SG]; > struct request *request; > int start_sg, end_sg; > + sector_t rq_pos; > }; > > #define DEFAULT_COW { \ > @@ -187,6 +188,7 @@ struct ubd { > .request = NULL, \ > .start_sg = 0, \ > .end_sg = 0, \ > + .rq_pos = 0, \ > } > > /* Protected by ubd_lock */ > @@ -1228,7 +1230,6 @@ static void do_ubd_request(struct request_queue *q) > { > struct io_thread_req *io_req; > struct request *req; > - sector_t sector; > int n; > > while(1){ > @@ -1239,12 +1240,12 @@ static void do_ubd_request(struct request_queue *q) > return; > > dev->request = req; > + dev->rq_pos = blk_rq_pos(req); > dev->start_sg = 0; > dev->end_sg = blk_rq_map_sg(q, req, dev->sg); > } > > req = dev->request; > - sector = blk_rq_pos(req); > while(dev->start_sg < dev->end_sg){ > struct scatterlist *sg = &dev->sg[dev->start_sg]; > > @@ -1256,10 +1257,9 @@ static void do_ubd_request(struct request_queue *q) > return; > } > prepare_request(req, io_req, > - (unsigned long long)sector << 9, > + (unsigned long long)dev->rq_pos << 9, > sg->offset, sg->length, sg_page(sg)); > > - sector += sg->length >> 9; > n = os_write_file(thread_fd, &io_req, > sizeof(struct io_thread_req *)); > if(n != sizeof(struct io_thread_req *)){ > @@ -1272,6 +1272,7 @@ static void do_ubd_request(struct request_queue *q) > return; > } > > + dev->rq_pos += sg->length >> 9; > dev->start_sg++; > } > dev->end_sg = 0; I tested this patch, on 2.6.35.5, as heavily as I could today, and was unable to reproduce the filesystem corruption. Seems to be fixed. :-) Thanks! - Chris |
From: Jens A. <ja...@fu...> - 2010-09-29 05:21:22
|
On 2010-09-29 10:29, Chris Frey wrote: > On Wed, Sep 29, 2010 at 08:10:06AM +0900, Jens Axboe wrote: >> It looks like that if we need to restart the requeue, then >> we use the initial position and not the current index. Does >> this help? >> >> diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c >> index 1bcd208..81ee063 100644 >> --- a/arch/um/drivers/ubd_kern.c >> +++ b/arch/um/drivers/ubd_kern.c >> @@ -162,7 +162,7 @@ struct ubd { >> spinlock_t lock; >> struct scatterlist sg[MAX_SG]; >> struct request *request; >> - int start_sg, end_sg; >> + int start_sg, end_sg, rq_off; >> }; >> >> #define DEFAULT_COW { \ >> @@ -187,6 +187,7 @@ struct ubd { >> .request = NULL, \ >> .start_sg = 0, \ >> .end_sg = 0, \ >> + .rq_off = 0, \ >> } >> >> /* Protected by ubd_lock */ >> @@ -1241,10 +1242,11 @@ static void do_ubd_request(struct request_queue *q) >> dev->request = req; >> dev->start_sg = 0; >> dev->end_sg = blk_rq_map_sg(q, req, dev->sg); >> + dev->rq_off = 0; >> } >> >> req = dev->request; >> - sector = blk_rq_pos(req); >> + sector = blk_rq_pos(req) + dev->rq_off; >> while(dev->start_sg < dev->end_sg){ >> struct scatterlist *sg = &dev->sg[dev->start_sg]; >> >> @@ -1273,6 +1275,7 @@ static void do_ubd_request(struct request_queue *q) >> } >> >> dev->start_sg++; >> + dev->rq_off += sg->length >> 9; >> } >> dev->end_sg = 0; >> dev->request = NULL; >> >> -- > > This patch does not fix the corruption issue for me. I applied the patch > to 2.6.35.5, and reproduced the "deleted inode referenced" errors > in both a gentoo and ubuntu guest OS. It does take longer to reproduce > though, with this patch. This seems to imply that the original commit pin pointed is not the only issue we have in that code atm. I think we need to find the real fix here, just disabling merging is not a fix (it's just a nasty work-around for the real bug). -- Jens Axboe |
From: richard -r. w. <ric...@gm...> - 2010-10-02 17:27:30
|
On Wed, Sep 29, 2010 at 7:21 AM, Jens Axboe <ja...@fu...> wrote: > I think we need to find the real fix here, just disabling merging > is not a fix (it's just a nasty work-around for the real bug). > Jens, Do you have an idea which parts of the code are buggy? -- Cheers, //richard |
From: Chris F. <cd...@fo...> - 2010-09-29 06:36:16
|
On Wed, Sep 29, 2010 at 02:21:07PM +0900, Jens Axboe wrote: > This seems to imply that the original commit pin pointed is not > the only issue we have in that code atm. > > I think we need to find the real fix here, just disabling merging > is not a fix (it's just a nasty work-around for the real bug). You're probably right, and I'm quite willing to help test further patches to help get to the bottom of this. - Chris |
From: Tejun H. <ht...@gm...> - 2010-10-04 16:37:49
|
Hello, sorry about chiming in later. I was off last week. On 09/29/2010 08:34 AM, Chris Frey wrote: > On Wed, Sep 29, 2010 at 02:21:07PM +0900, Jens Axboe wrote: >> This seems to imply that the original commit pin pointed is not >> the only issue we have in that code atm. >> >> I think we need to find the real fix here, just disabling merging >> is not a fix (it's just a nasty work-around for the real bug). > > You're probably right, and I'm quite willing to help test further patches > to help get to the bottom of this. I think we're on the right track. The problem with Jens' patch was that it didn't consider the fact that blk_end_request() now internally updates the current position of the request, so if restart happens after some of part of the request is complete it will end up adding the offsets multiple times. I think slightly modifying it to track the current position instead of offset should do it. Chris, can you please try the following patch and see whether the problem goes away? Thanks. diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index 1bcd208..d8a5d8c 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -163,6 +163,7 @@ struct ubd { struct scatterlist sg[MAX_SG]; struct request *request; int start_sg, end_sg; + sector_t rq_pos; }; #define DEFAULT_COW { \ @@ -187,6 +188,7 @@ struct ubd { .request = NULL, \ .start_sg = 0, \ .end_sg = 0, \ + .rq_pos = 0, \ } /* Protected by ubd_lock */ @@ -1228,7 +1230,6 @@ static void do_ubd_request(struct request_queue *q) { struct io_thread_req *io_req; struct request *req; - sector_t sector; int n; while(1){ @@ -1244,7 +1245,7 @@ static void do_ubd_request(struct request_queue *q) } req = dev->request; - sector = blk_rq_pos(req); + dev->rq_pos = blk_rq_pos(req); while(dev->start_sg < dev->end_sg){ struct scatterlist *sg = &dev->sg[dev->start_sg]; @@ -1256,10 +1257,10 @@ static void do_ubd_request(struct request_queue *q) return; } prepare_request(req, io_req, - (unsigned long long)sector << 9, + (unsigned long long)dev->rq_pos << 9, sg->offset, sg->length, sg_page(sg)); - sector += sg->length >> 9; + dev->rq_pos += sg->length >> 9; n = os_write_file(thread_fd, &io_req, sizeof(struct io_thread_req *)); if(n != sizeof(struct io_thread_req *)){ |
From: Chris F. <cd...@fo...> - 2010-10-04 19:53:13
|
On Mon, Oct 04, 2010 at 06:37:36PM +0200, Tejun Heo wrote: > Hello, sorry about chiming in later. I was off last week. No problem, I'm eager to test patches to fix this. > I think we're on the right track. The problem with Jens' patch was > that it didn't consider the fact that blk_end_request() now internally > updates the current position of the request, so if restart happens > after some of part of the request is complete it will end up adding > the offsets multiple times. I think slightly modifying it to track > the current position instead of offset should do it. Chris, can you > please try the following patch and see whether the problem goes away? Unfortunately, this patch does not fix it for me. I applied your patch to kernel 2.6.35.5, and got the usual error: EXT3-fs error (device ubda): ext3_lookup: deleted inode referenced: 566188 Lots of them, actually. - Chris |