[uml-devel] Re: Problem with COW segfault in cowify_req

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

[A little introduction...]

I'm new here and only recently discovered the UML project after being
frustrated by VMware's inability to install/run on my RedHat 7.3 system.
Very cool stuff - I'm currently running (or trying to run) 3 full-blown
gui app development setups as UMLs (and an additional one on the host)
for Redhat 7.3 & 8.0, and Mandrake 9.0.

[And now to butt right in on a 2 week old discussion...]

doug@ea wrote:
>Jeff Dike wrote:
>> doug@ea... said:
>> 
>>>I worked around this by going into ubd_kern.c and taking the point
>>>where  memory is allocated for the dev->cow.buffer and adding another
>>>page to  it as in:
>>>
>>>            dev->cow.bitmap = (void *) vmalloc(dev->cow.bitmap_len+4096); 
>> 
>> 
>> That's an OK workaround.  This is a known bug which has been around for a
>> while.  This is the first time I've heard of it causing crashes.
>> 
>> 				Jeff
>
>I noticed a post from late last year that had exactly the same gdb dump 
>as I got.  This first happened when I was trying to run an ext3 inside 
>the UM.  I would get this hang when building a large, empty, non-sparse 
>file.  Usually, things would crash about 400 meg in.  I suspect that 
>ext3 placed the journal close to the end of the volume.  After moving to 
>ext2, the problem was harder to create but still there.  In this case, 
>it did not happen until the volume was getting pretty full.
>
>I suspect that users that don't have exactly even boundaries for the cow 
>filesystems don't get this.  If this fix is valid, perhaps you should 
>add it to the dist.  I just hate "fudges" without knowing the underlying 
>logic.

This is indeed the case.  My cowfs covers exactly 512MB and the resulting
bitmap is 128k in size resulting in a bitmap that completely fills
the resulting allocation in the kernel.  Walking off the end results
in a segfault.  There are obviously other combinations of COW sizes
that will result in similar situations.

I just found this list (and this posting) after discovering the exact
cause of the problem I had in one of my UMLs and I didn't like the 
workaround because it simply masks a buffer overrun and *may* even
wind up corrupting the first word of the COW fs for some undetermined
layout(s).  I say *may* because I didn't dig far enough to determine
what the COW alignment is in the disk file.  I'd suspect that there
is some padding at the end of the bitmap so that the COW is at least
sector aligned (and probably should be bumped to an 8k boundary in
the file).  If this isn't the case, maybe there will be a COW v3?

I think the workaround above is largely harmless, but I'm proposing
the following change for a IMO better solution.

--- linux-2.4.19/arch/um/drivers/ubd_kern.c.orig	Tue Mar  4 15:20:47 2003
+++ linux-2.4.19/arch/um/drivers/ubd_kern.c	Sat Mar  8 00:11:42 2003
@@ -835,7 +835,20 @@
 				    dev->cow.bitmap);
 		}
 		if(update_bitmap){
+			/*
+			 * There is apparently a long standing bug when the
+			 * cow_offset of the start of the request falls into the
+			 * last word of the bitmap.  At this point, the
+			 * bitmap_word[1] walks off the end.  The most obvious
+			 * solution is to back up by one word and write the last
+			 * two words of the map instead of garbage or the
+			 * segfault that occurs for some page sized bitmaps.
+			 */
 			req->cow_offset = sector / (sizeof(unsigned long) * 8);
+			if (req->cow_offset ==
+				(dev->cow.bitmap_len / sizeof(unsigned long)) - 1) {
+				req->cow_offset -= 1;
+			}
 			req->bitmap_words[0] = 
 				dev->cow.bitmap[req->cow_offset];
 			req->bitmap_words[1] = 

-- 
Lynn Kerby <mailto:lf...@ke...>