From: Dan B. <dba...@or...> - 2011-09-26 16:43:42
|
I'm having issues when using cow files with CentOS6 system images. It is specific to CentOS6. When I tried with a debian image built with debootstrap, the system booted just fine. I am experiencing this issue both with a custom built UML kernel as well as the kernel I obtained from the debian repository (2.6.32-1um-4+34squeeze1) The issue I am experiencing is that during the boot process, udev hangs forever and the boot process does not complete. It only occurs when I use a cow file by specifying just the cow file on the command line like "ubd0=cow_fs" rather than "ubd0=cow_fs,root_fs". When I specify both the cow file and the backing file, the problem doesn't happen. I am able to specify ubd0 either way with a debian image and the system boots as expected. I have tried editing the appropriate rc script in the CentOS6 image to take udev out of the boot process, and the system boots, but has problems related to udev not running. I have also tried specifying the "udevtimeout" parameter on the kernel command line to see if I could force it to time out quickly and carry on with booting, but that made no difference. This leads me to believe that udev isn't just spinning it's wheels waiting for something, but it's actually hung or crashed. I'm kind of at a loss here. I don't know what interaction between the cow fs and udev would cause udev to hang, or if it's something that's only tangentially related to using the cow fs that's causing the problem. It may even be a bug in udev. I'm assuming that the kernel presents the block device ubd0/a to the system the same way, regardless if the device is specified to the kernel as "cow_file,backing_file" or just "cow_file", so I don't know why that would cause an issue. However, the manner that I use to specify ubd0 on the command line is literally the only thing that is different between a system that boots successfully, and one that hangs at starting udev. The environment in which I run the UML kernel doesn't seem to make a difference; I have run the kernel on both my Debian Squeeze box and one of our CentOS6 servers (which is where these machines will be running in production). Dan |
From: richard -r. w. <ric...@gm...> - 2011-09-26 21:04:43
|
Dan, On Mon, Sep 26, 2011 at 6:27 PM, Dan Bassett <dba...@or...> wrote: > I'm having issues when using cow files with CentOS6 system images. It > is specific to CentOS6. When I tried with a debian image built with > debootstrap, the system booted just fine. I am experiencing this issue > both with a custom built UML kernel as well as the kernel I obtained > from the debian repository (2.6.32-1um-4+34squeeze1) Is the issue related to CentOS 6 or udev? IOW does the same udev version work on e.g. Debian? > The issue I am experiencing is that during the boot process, udev hangs > forever and the boot process does not complete. It only occurs when I > use a cow file by specifying just the cow file on the command line like > "ubd0=cow_fs" rather than "ubd0=cow_fs,root_fs". When I specify both > the cow file and the backing file, the problem doesn't happen. I am > able to specify ubd0 either way with a debian image and the system boots > as expected. I have tried editing the appropriate rc script in the > CentOS6 image to take udev out of the boot process, and the system > boots, but has problems related to udev not running. Can you find out _where_ udev hangs? Is it a endless loop? A blocking system call? -- Thanks, //richard |
From: Dan B. <dba...@or...> - 2011-09-26 22:16:48
|
On 09/26/2011 04:04 PM, richard -rw- weinberger wrote: > Dan, > > On Mon, Sep 26, 2011 at 6:27 PM, Dan Bassett<dba...@or...> wrote: > >> I'm having issues when using cow files with CentOS6 system images. It >> is specific to CentOS6. When I tried with a debian image built with >> debootstrap, the system booted just fine. I am experiencing this issue >> both with a custom built UML kernel as well as the kernel I obtained >> from the debian repository (2.6.32-1um-4+34squeeze1) >> > Is the issue related to CentOS 6 or udev? > IOW does the same udev version work on e.g. Debian? > Actually It looks like udev isn't even installed on the Debian box, it must have a static /dev. I think I may have assumed that this image built with debootstrap was more like a regular debian install than it actually is. That doesn't really answer the question of why not specifying the backing storage file on the kernel command line would cause udev to hang, though. > >> The issue I am experiencing is that during the boot process, udev hangs >> forever and the boot process does not complete. It only occurs when I >> use a cow file by specifying just the cow file on the command line like >> "ubd0=cow_fs" rather than "ubd0=cow_fs,root_fs". When I specify both >> the cow file and the backing file, the problem doesn't happen. I am >> able to specify ubd0 either way with a debian image and the system boots >> as expected. I have tried editing the appropriate rc script in the >> CentOS6 image to take udev out of the boot process, and the system >> boots, but has problems related to udev not running. >> > Can you find out _where_ udev hangs? > Is it a endless loop? A blocking system call? > > I did some digging just now and found that in CentOS6, udev is initialized using a script at /sbin/start_udev. I began putting echo statements here and there trying to narrow down where things were actually getting stuck and I found that it always happens at a "udevadm" command, and usually it's at a "udevadm settle". I honestly don't know enough about udev and/or the uml cow format to make an educated (or even uneducated) guess as to why those particular commands hang. If you have suggestions about other methods I could use to glean more debugging information, I would be happy to investigate further. I could just potentially throw up my hands, say "forget it" and go with a static /dev since I don't need a udev controlled /dev for this particular application, but it would be nice to debug this for future UML users all over the world :-) |
From: Dan B. <dba...@or...> - 2011-09-27 17:10:03
|
On 09/26/2011 05:16 PM, Dan Bassett wrote: > On 09/26/2011 04:04 PM, richard -rw- weinberger wrote: > >> Dan, >> >> On Mon, Sep 26, 2011 at 6:27 PM, Dan Bassett<dba...@or...> wrote: >> >> >>> I'm having issues when using cow files with CentOS6 system images. It >>> is specific to CentOS6. When I tried with a debian image built with >>> debootstrap, the system booted just fine. I am experiencing this issue >>> both with a custom built UML kernel as well as the kernel I obtained >>> from the debian repository (2.6.32-1um-4+34squeeze1) >>> >>> >> Is the issue related to CentOS 6 or udev? >> IOW does the same udev version work on e.g. Debian? >> >> > Actually It looks like udev isn't even installed on the Debian box, it > must have a static /dev. I think I may have assumed that this image > built with debootstrap was more like a regular debian install than it > actually is. That doesn't really answer the question of why not > specifying the backing storage file on the kernel command line would > cause udev to hang, though. > >> >> >>> The issue I am experiencing is that during the boot process, udev hangs >>> forever and the boot process does not complete. It only occurs when I >>> use a cow file by specifying just the cow file on the command line like >>> "ubd0=cow_fs" rather than "ubd0=cow_fs,root_fs". When I specify both >>> the cow file and the backing file, the problem doesn't happen. I am >>> able to specify ubd0 either way with a debian image and the system boots >>> as expected. I have tried editing the appropriate rc script in the >>> CentOS6 image to take udev out of the boot process, and the system >>> boots, but has problems related to udev not running. >>> >>> >> Can you find out _where_ udev hangs? >> Is it a endless loop? A blocking system call? >> >> >> > I did some digging just now and found that in CentOS6, udev is > initialized using a script at /sbin/start_udev. I began putting echo > statements here and there trying to narrow down where things were > actually getting stuck and I found that it always happens at a "udevadm" > command, and usually it's at a "udevadm settle". I honestly don't know > enough about udev and/or the uml cow format to make an educated (or even > uneducated) guess as to why those particular commands hang. If you have > suggestions about other methods I could use to glean more debugging > information, I would be happy to investigate further. > > I could just potentially throw up my hands, say "forget it" and go with > a static /dev since I don't need a udev controlled /dev for this > particular application, but it would be nice to debug this for future > UML users all over the world :-) > I did some more digging after finding some more debugging flags for start_udev and I have more information today. After serializing udev's startup process, it looks like the boot process always hangs in the same spot. During the processing of the persistent storage rules that ship with udev, the following rule is encountered: KERNEL!="sr*", IMPORT{program}="/sbin/blkid -o udev -p $tempnode" This results in /sbin/blkid being run for ubd0/a as such: util_run_program: '/sbin/blkid -o udev -p /dev/.tmp-block-98:0' started This is where it hangs. From the blkid man page: "The blkid program is the command-line interface to working with libblkid(3) library. It can determine the type of content (e.g. filesystem, swap) a block device holds, and also attributes (tokens, NAME=value pairs) from the content metadata (e.g. LABEL or UUID fields)." So it sounds like udev is trying to use blkid to read metadata from ubd0/a and is failing. Again, I don't know what goes on with the internals of the cow filesystem, so I don't know how specifying the backing store versus not specifying it would make a difference here. It's my understanding that the cow file contains the information for the backing store in a header of some sort and the kernel just takes care of opening the backing file and presenting the two to the system as ubd0/a. Dan |
From: richard -r. w. <ric...@gm...> - 2011-09-28 11:38:38
Attachments:
trigger.diff
|
On Tue, Sep 27, 2011 at 7:09 PM, Dan Bassett <dba...@or...> wrote: > I did some more digging after finding some more debugging flags for > start_udev and I have more information today. After serializing udev's > startup process, it looks like the boot process always hangs in the same > spot. During the processing of the persistent storage rules that ship with > udev, the following rule is encountered: > > KERNEL!="sr*", IMPORT{program}="/sbin/blkid -o udev -p $tempnode" > > This results in /sbin/blkid being run for ubd0/a as such: > > util_run_program: '/sbin/blkid -o udev -p /dev/.tmp-block-98:0' started > Can you please test the attached patch? I should trigger the BUG_ON(). There is definitely something fishy... -- Thanks, //richard |
From: richard -r. w. <ric...@gm...> - 2011-09-28 19:50:37
Attachments:
fix_cow_size.diff
|
On Tue, Sep 27, 2011 at 7:09 PM, Dan Bassett <dba...@or...> wrote: > I did some more digging after finding some more debugging flags for > start_udev and I have more information today. After serializing udev's > startup process, it looks like the boot process always hangs in the same > spot. During the processing of the persistent storage rules that ship with > udev, the following rule is encountered: > > KERNEL!="sr*", IMPORT{program}="/sbin/blkid -o udev -p $tempnode" > > This results in /sbin/blkid being run for ubd0/a as such: > > util_run_program: '/sbin/blkid -o udev -p /dev/.tmp-block-98:0' started The attached patch should fix the issue. Please confirm. :-) -- Thanks, //richard |
From: richard -r. w. <ric...@gm...> - 2011-09-28 19:53:50
Attachments:
fix_cow_size.diff
|
On Tue, Sep 27, 2011 at 7:09 PM, Dan Bassett <dba...@or...> wrote: > I did some more digging after finding some more debugging flags for > start_udev and I have more information today. After serializing udev's > startup process, it looks like the boot process always hangs in the same > spot. During the processing of the persistent storage rules that ship with > udev, the following rule is encountered: > > KERNEL!="sr*", IMPORT{program}="/sbin/blkid -o udev -p $tempnode" > > This results in /sbin/blkid being run for ubd0/a as such: > > util_run_program: '/sbin/blkid -o udev -p /dev/.tmp-block-98:0' started The attached patch should fix the issue. Please confirm. :-) -- Thanks, //richard |
From: Dan B. <dba...@or...> - 2011-09-28 20:05:02
|
On 09/28/2011 02:53 PM, richard -rw- weinberger wrote: > On Tue, Sep 27, 2011 at 7:09 PM, Dan Bassett<dba...@or...> wrote: > >> I did some more digging after finding some more debugging flags for >> start_udev and I have more information today. After serializing udev's >> startup process, it looks like the boot process always hangs in the same >> spot. During the processing of the persistent storage rules that ship with >> udev, the following rule is encountered: >> >> KERNEL!="sr*", IMPORT{program}="/sbin/blkid -o udev -p $tempnode" >> >> This results in /sbin/blkid being run for ubd0/a as such: >> >> util_run_program: '/sbin/blkid -o udev -p /dev/.tmp-block-98:0' started >> > The attached patch should fix the issue. > Please confirm. :-) > Richard- Yes, I can confirm that the patch works. Thanks for fixing this! Dan |
From: richard -r. w. <ric...@gm...> - 2011-09-28 20:08:13
|
On Wed, Sep 28, 2011 at 10:04 PM, Dan Bassett <dba...@or...> wrote: > > > On 09/28/2011 02:53 PM, richard -rw- weinberger wrote: >> >> On Tue, Sep 27, 2011 at 7:09 PM, Dan Bassett<dba...@or...> wrote: >> >>> >>> I did some more digging after finding some more debugging flags for >>> start_udev and I have more information today. After serializing udev's >>> startup process, it looks like the boot process always hangs in the same >>> spot. During the processing of the persistent storage rules that ship >>> with >>> udev, the following rule is encountered: >>> >>> KERNEL!="sr*", IMPORT{program}="/sbin/blkid -o udev -p $tempnode" >>> >>> This results in /sbin/blkid being run for ubd0/a as such: >>> >>> util_run_program: '/sbin/blkid -o udev -p /dev/.tmp-block-98:0' started >>> >> >> The attached patch should fix the issue. >> Please confirm. :-) >> > > Richard- > Yes, I can confirm that the patch works. Thanks for fixing this! Perfect. :) -- Thanks, //richard |