You can subscribe to this list here.
2006 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(105) |
Nov
(10) |
Dec
(7) |
2008 |
Jan
|
Feb
(31) |
Mar
(13) |
Apr
(7) |
May
|
Jun
(2) |
Jul
(1) |
Aug
|
Sep
(4) |
Oct
|
Nov
(23) |
Dec
|
2009 |
Jan
(25) |
Feb
(24) |
Mar
(10) |
Apr
(8) |
May
(4) |
Jun
(6) |
Jul
(27) |
Aug
(1) |
Sep
|
Oct
(2) |
Nov
(7) |
Dec
(25) |
2010 |
Jan
|
Feb
(7) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
(1) |
From: Gordan B. <go...@bo...> - 2009-02-13 14:23:39
|
On Fri, 13 Feb 2009 15:08:04 +0100, Marc Grimme <gr...@at...> wrote: >> > But how about your hw-detection does it work now stable? Cause my >> > cluster >> > that had the problems is running just fine. >> >> I've not tested the new versions on the cluster that was having problems, >> but I have tested it on another cluster, and although it does seem to >> work >> fine in general, the new probing approach managed to crash the machine >> once. I'll test it some more since I've only seen that happen once (after >> a reboot it came up OK). > > ;-) Yeah, it's a bit deja-vu after the reason for the improved probing was that booting would randomly fail. ;-) >> A few observations - the first probing pass doesn't clean up and unload >> modules properly. The dependencies among the drivers don't get resolved, >> so, for example, scsi disk drivers don't get unloaded properly because >> they have dependencies, which causes errors to get reported back. >> >> The other immediate observation is that the probing is very slow compared >> to the old image. It takes 2-3 times longer to get to the final init. The >> fact that I have a lot of SATA ports (10) in that system probably doesn't >> help. >> >> So, what would be really nice is a way to only do probing if >> configuration information isn't available (e.g. >> modprobe.conf^H^H^H^H^H^H^H^H^H^H^H^H^H^H >> driver= parameter in cluster.conf ;-)) - and by this I am referring to >> disk controller drivers in addition to the NIC drivers. Multi-pass probing is >> all well and good for a default approach, but it would be really nice to >> be able to skip it for performance reasons with the configuration explicitly >> specified. > > Hm. I see your point. > > What about this if and only if the @driver is specified for every node in > the cluster.conf we are using these drivers to detect the MAC Addresses and > skip everything else? Sounds good. :) > The about no loading scsidrivers and the like is you cannot selectively > trigger udev. Or I didn't find a way. Fair enough. But if the probing isn't triggered iif the NIC @driver is specified, then that should avoid the probing anyway. :) Gordan |
From: Marc G. <gr...@at...> - 2009-02-13 14:08:11
|
On Friday 13 February 2009 12:33:06 Gordan Bobic wrote: > On Thu, 12 Feb 2009 10:13:38 +0100, Marc Grimme <gr...@at...> wrote: > > But how about your hw-detection does it work now stable? Cause my cluster > > that had the problems is running just fine. > > I've not tested the new versions on the cluster that was having problems, > but I have tested it on another cluster, and although it does seem to work > fine in general, the new probing approach managed to crash the machine > once. I'll test it some more since I've only seen that happen once (after a > reboot it came up OK). ;-) > > A few observations - the first probing pass doesn't clean up and unload > modules properly. The dependencies among the drivers don't get resolved, > so, for example, scsi disk drivers don't get unloaded properly because they > have dependencies, which causes errors to get reported back. > > The other immediate observation is that the probing is very slow compared > to the old image. It takes 2-3 times longer to get to the final init. The > fact that I have a lot of SATA ports (10) in that system probably doesn't > help. > > So, what would be really nice is a way to only do probing if configuration > information isn't available (e.g. modprobe.conf^H^H^H^H^H^H^H^H^H^H^H^H^H^H > driver= parameter in cluster.conf ;-)) - and by this I am referring to disk > controller drivers in addition to the NIC drivers. Multi-pass probing is > all well and good for a default approach, but it would be really nice to be > able to skip it for performance reasons with the configuration explicitly > specified. Hm. I see your point. What about this if and only if the @driver is specified for every node in the cluster.conf we are using these drivers to detect the MAC Addresses and skip everything else? The about no loading scsidrivers and the like is you cannot selectively trigger udev. Or I didn't find a way. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Gordan B. <go...@bo...> - 2009-02-13 13:31:56
|
On Thu, 12 Feb 2009 10:13:38 +0100, Marc Grimme <gr...@at...> wrote: > But how about your hw-detection does it work now stable? Cause my cluster > that had the problems is running just fine. I've not tested the new versions on the cluster that was having problems, but I have tested it on another cluster, and although it does seem to work fine in general, the new probing approach managed to crash the machine once. I'll test it some more since I've only seen that happen once (after a reboot it came up OK). A few observations - the first probing pass doesn't clean up and unload modules properly. The dependencies among the drivers don't get resolved, so, for example, scsi disk drivers don't get unloaded properly because they have dependencies, which causes errors to get reported back. The other immediate observation is that the probing is very slow compared to the old image. It takes 2-3 times longer to get to the final init. The fact that I have a lot of SATA ports (10) in that system probably doesn't help. So, what would be really nice is a way to only do probing if configuration information isn't available (e.g. modprobe.conf^H^H^H^H^H^H^H^H^H^H^H^H^H^H driver= parameter in cluster.conf ;-)) - and by this I am referring to disk controller drivers in addition to the NIC drivers. Multi-pass probing is all well and good for a default approach, but it would be really nice to be able to skip it for performance reasons with the configuration explicitly specified. Gordan |
From: Gordan B. <go...@bo...> - 2009-02-12 10:30:03
|
I'll try to get around to trying it out over the weekend. That particular cluster is a production one so I couldn't just bounce it. I have, however, tried the new updates (including the extras packages for glusterfs and md) on the cluster I'm building at the moment, and it all seems to work just fine. :) Gordan On Thu, 12 Feb 2009 10:13:38 +0100, Marc Grimme <gr...@at...> wrote: > On Thursday 12 February 2009 03:51:50 Gordan Bobic wrote: >> Marc Grimme wrote: >> > - glusterfs support >> >> One minor issue here, in the glusterfs rpm list there should be fuse-lib >> package included. Sorry if I sent a broken patch. :( > No problem. I'll update it. > But how about your hw-detection does it work now stable? Cause my cluster > that > had the problems is running just fine. > > Marc. |
From: Marc G. <gr...@at...> - 2009-02-12 09:13:53
|
On Thursday 12 February 2009 03:51:50 Gordan Bobic wrote: > Marc Grimme wrote: > > - glusterfs support > > One minor issue here, in the glusterfs rpm list there should be fuse-lib > package included. Sorry if I sent a broken patch. :( No problem. I'll update it. But how about your hw-detection does it work now stable? Cause my cluster that had the problems is running just fine. Marc. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Gordan B. <go...@bo...> - 2009-02-12 02:51:58
|
Marc Grimme wrote: > - glusterfs support One minor issue here, in the glusterfs rpm list there should be fuse-lib package included. Sorry if I sent a broken patch. :( Gordan |
From: Marc G. <gr...@at...> - 2009-02-10 19:53:39
|
Hello *, finally we got the first stage of the new version of the osr cluster suite in the preview channel. This will be the first release candidate for the next productive channel. NOTE: Those packages are NOT YET production ready. And they might include bugs as they did not undergo the Q&A. So use them at your own risc. 2nd NOTE: Those packages were only tested with OSR<RHEL5> clusters based on GFS or NFS. OSR<SLES10/ocfs2>,OSR<Fedora/gfs2|nfs> and OSR<RHEL5/ocfs2> will be tested next. This means you shouldn't use those packages with ocfs2 as rootfilesystem. As this is only a preview update I will sum up the most important changes not automated: - Rewritten hardwaredetection: https://bugzilla.atix.de/show_bug.cgi?id=325 - Usability review: https://bugzilla.atix.de/show_bug.cgi?id=323 - Support for breakpoints in bootprocess - Break shell will always continue the bootprocess when exited - Bootparameters may be overwritten in Breakshell - Rootfs Check with bootparameter rootfsck (use with care and up to now only with GFS) - NFS OSR will not start scsi detection - Preview support for OSR on fedora (NFS) - Preview support for OSR<NFS4> without authentication - Preview support for OSR<RHEL5 with localfs> - Many small bugfixes Patches from Gordan Bobic (all preview status): - glusterfs support - Software RAID with md support - diet patch (new option to mkinitrd <-l> to make initrd smaller) Please let me know what you think Thanks Marc P.S. May the root always be shared! ;-) -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Marc G. <gr...@at...> - 2009-02-08 14:19:45
|
> > > > I think that should do. Especially if we normally not import it but only > > if the rpm comoonics-bootimage-extras-md is installed. > > Ok forget about the rest for the first. > > OK. I'll send a new patch later. :) No need, it "nearly" applied (small line changes). I'll let you know what the actual status is. > -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Gordan B. <go...@bo...> - 2009-02-08 14:06:52
|
Marc Grimme wrote: > On Sunday 08 February 2009 14:29:42 Gordan Bobic wrote: >> Marc Grimme wrote: >>> On Thursday 05 February 2009 23:11:24 Gordan Bobic wrote: >>>> It would appear that OSR doesn't start up MD software RAID arrays. > ... >> Well, I just got shared root working on GlusterFS, which is effectively >> a fuse stackable replication file system. It requires a standard >> underlying file system with xattr support (e.g. ext3). This in turn has >> to be backed by a block device, and in my case, that block device is an >> MD software RAID. > > Ok I thought about something like this. Cause I think it's not usable for > clusterfilesystems like GFS. Indeed, it isn't. Nobody seems to know when MD will get any cluster awareness. :( >> I'm not talking about detecting the presence of mdadm devices. I'm >> talking about detecting the presence of /sbin/mdadm in the init-root. In >> other words, if you want MD support, just add rpm and file mdadm.list >> files and replace the line I added "mdadm --assemble --scan" with >> something more like: >> >> if [ -x /sbin/mdadm ]; then >> mdadm --assemble --scan >> fi >> >> That's all I was talking about, really. This will check for MD markers >> on all the available disks and assemble any it can. > > I think that should do. Especially if we normally not import it but only if > the rpm comoonics-bootimage-extras-md is installed. > Ok forget about the rest for the first. OK. I'll send a new patch later. :) >> I'm not sure this is worthwhile, though. It would be much more >> straightforward to just add the if block mentioned above and have >> something like and extras-mdadm package that just adds the >> /etc/comoonics/bootimage/files.initrd.d/mdadm.list >> /etc/comoonics/bootimage/rpms.initrd.d/mdadm.list >> files (dependency on mdadm rpm). That means the total bloat of 3 lines in >> /opt/atix/comoonics-bootimage/boot-scripts/etc/hardware-lib.sh >> if it isn't actually required for the specific build. >> > ... >>> No, not yet. I didn't forget it but had no time to reallly think about >>> it. I'm still not sure how to use it best. >> OK. :) >> IIRC, the patch I provided makes the parameter optional and non-default >> anyway. So without the -l parameter it'll do the exact same thing as it >> does now (i.e. bundle all the drivers for the current kernel). > > Hm. Yes I should think about it again. > Can I still use the old patch you've provided? It applied cleanly last time I tried it, but I'll double check. >>> With the python packages. The .pyo/pyc is a point. I'll think about >>> adding something like <package> *.py to the listfiles listing python >>> rpms. That means only files with .py in the end will be included. And no >>> I don't think it works negative. ;-) >> What about not including them in the rpm in the first place? And/or >> perhaps only generating them in the RPM post-install? > > I don't want to risk take even more time then it does now. As it is taking > some time to build an initrd. There may be ways to speed it up, I'll have a think about it. The most time seems to be spent with context switching between process invocations. If we can find a way to combine the extraction of file list from all RPMs in question in one go (specify multiple packages as RPM parameters - not sure if this'll work), then I suspect the time would be massively cut. This seems to be what takes up most time. The only drawback is that filters wouldn't be directly applicable per rpm, but globally. I'm not sure if this is actually a problem, and if it is, how big. The 2nd longest part is the compression of the initrd. Using a version of gzip built specifically optimized for the target platform using ICC yields a speed-up of around 20+%. > BTW. we can spare 20/150 MB if we remove the *.pyc/*.pyo. That's not too bad. Similar amount if not more by pruning all the unused kernel modules. :) RH kernels come compiled with everything and the kitchen sink. :( Gordan |
From: Marc G. <gr...@at...> - 2009-02-08 13:45:16
|
On Sunday 08 February 2009 14:29:42 Gordan Bobic wrote: > Marc Grimme wrote: > > On Thursday 05 February 2009 23:11:24 Gordan Bobic wrote: > >> It would appear that OSR doesn't start up MD software RAID arrays. ... > Well, I just got shared root working on GlusterFS, which is effectively > a fuse stackable replication file system. It requires a standard > underlying file system with xattr support (e.g. ext3). This in turn has > to be backed by a block device, and in my case, that block device is an > MD software RAID. Ok I thought about something like this. Cause I think it's not usable for clusterfilesystems like GFS. > ... > I'm not talking about detecting the presence of mdadm devices. I'm > talking about detecting the presence of /sbin/mdadm in the init-root. In > other words, if you want MD support, just add rpm and file mdadm.list > files and replace the line I added "mdadm --assemble --scan" with > something more like: > > if [ -x /sbin/mdadm ]; then > mdadm --assemble --scan > fi > > That's all I was talking about, really. This will check for MD markers > on all the available disks and assemble any it can. I think that should do. Especially if we normally not import it but only if the rpm comoonics-bootimage-extras-md is installed. Ok forget about the rest for the first. ... > > I'm not sure this is worthwhile, though. It would be much more > straightforward to just add the if block mentioned above and have > something like and extras-mdadm package that just adds the > /etc/comoonics/bootimage/files.initrd.d/mdadm.list > /etc/comoonics/bootimage/rpms.initrd.d/mdadm.list > files (dependency on mdadm rpm). That means the total bloat of 3 lines in > /opt/atix/comoonics-bootimage/boot-scripts/etc/hardware-lib.sh > if it isn't actually required for the specific build. > ... > > No, not yet. I didn't forget it but had no time to reallly think about > > it. I'm still not sure how to use it best. > > OK. :) > IIRC, the patch I provided makes the parameter optional and non-default > anyway. So without the -l parameter it'll do the exact same thing as it > does now (i.e. bundle all the drivers for the current kernel). Hm. Yes I should think about it again. Can I still use the old patch you've provided? > > > With the python packages. The .pyo/pyc is a point. I'll think about > > adding something like <package> *.py to the listfiles listing python > > rpms. That means only files with .py in the end will be included. And no > > I don't think it works negative. ;-) > > What about not including them in the rpm in the first place? And/or > perhaps only generating them in the RPM post-install? I don't want to risk take even more time then it does now. As it is taking some time to build an initrd. BTW. we can spare 20/150 MB if we remove the *.pyc/*.pyo. That's not too bad. > > Gordan Marc. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Gordan B. <go...@bo...> - 2009-02-08 13:29:51
|
Marc Grimme wrote: > On Thursday 05 February 2009 23:11:24 Gordan Bobic wrote: >> It would appear that OSR doesn't start up MD software RAID arrays. > yes. >> Well, now it does. Patches attached. :) > ;-) > How and were are you using md? Well, I just got shared root working on GlusterFS, which is effectively a fuse stackable replication file system. It requires a standard underlying file system with xattr support (e.g. ext3). This in turn has to be backed by a block device, and in my case, that block device is an MD software RAID. >> It fires up any MD arrays that can be assembled right after the DM >> arrays. I'm not sure if this should be made optional (only do it if >> mdadm is present, which will only be the case of the mdadm.list files >> are added to the initrd. Any thoughts on this? > > Good question. I'm not sure. I would love to have it only if you want it. But > how can an md-device be detected? I'm not talking about detecting the presence of mdadm devices. I'm talking about detecting the presence of /sbin/mdadm in the init-root. In other words, if you want MD support, just add rpm and file mdadm.list files and replace the line I added "mdadm --assemble --scan" with something more like: if [ -x /sbin/mdadm ]; then mdadm --assemble --scan fi That's all I was talking about, really. This will check for MD markers on all the available disks and assemble any it can. If you want the actual inclusion into the initrd requirement to be auto-detected, then I guess it could be done by checking the contents of /proc/mdstat. If it looks like this: # cat /proc/mdstat Personalities : unused devices: <none> then there aren't any (activated, anyway). If it looks more like this: # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [raid1] md0 : active raid1 hda1[0] hdg1[1] 72192 blocks [2/2] [UU] md1 : active raid5 hde2[3] hdk2[4] hdi2[5] hdg2[1] hdc2[2] hda2[0] 601473280 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU] unused devices: <none> then I there are activated devices. I'm not sure this is worthwhile, though. It would be much more straightforward to just add the if block mentioned above and have something like and extras-mdadm package that just adds the /etc/comoonics/bootimage/files.initrd.d/mdadm.list /etc/comoonics/bootimage/rpms.initrd.d/mdadm.list files (dependency on mdadm rpm). That means the total bloat of 3 lines in /opt/atix/comoonics-bootimage/boot-scripts/etc/hardware-lib.sh if it isn't actually required for the specific build. >> On an unrelated note, mkinitrd seems to put all the .pyc and .pyo files >> into the initrd. I only found this out because I wiped them while >> troubleshooting the cdslDefaults_element issue, since it complains those >> files are not there. Are these really necessary? I would have thought >> those non-essential, and anything making the initrd smaller is probably >> a good thing... Speaking of which, did my mkinitrd -l parameter optional >> "diet" kernel module patch ever make it into the trunk? > > No, not yet. I didn't forget it but had no time to reallly think about it. I'm > still not sure how to use it best. OK. :) IIRC, the patch I provided makes the parameter optional and non-default anyway. So without the -l parameter it'll do the exact same thing as it does now (i.e. bundle all the drivers for the current kernel). > With the python packages. The .pyo/pyc is a point. I'll think about adding > something like <package> *.py to the listfiles listing python rpms. That > means only files with .py in the end will be included. And no I don't think > it works negative. ;-) What about not including them in the rpm in the first place? And/or perhaps only generating them in the RPM post-install? Gordan |
From: Marc G. <gr...@at...> - 2009-02-08 13:02:15
|
Hi Gordan, On Thursday 05 February 2009 23:11:24 Gordan Bobic wrote: > It would appear that OSR doesn't start up MD software RAID arrays. yes. > > Well, now it does. Patches attached. :) ;-) How and were are you using md? > > It fires up any MD arrays that can be assembled right after the DM > arrays. I'm not sure if this should be made optional (only do it if > mdadm is present, which will only be the case of the mdadm.list files > are added to the initrd. Any thoughts on this? Good question. I'm not sure. I would love to have it only if you want it. But how can an md-device be detected? > > On an unrelated note, mkinitrd seems to put all the .pyc and .pyo files > into the initrd. I only found this out because I wiped them while > troubleshooting the cdslDefaults_element issue, since it complains those > files are not there. Are these really necessary? I would have thought > those non-essential, and anything making the initrd smaller is probably > a good thing... Speaking of which, did my mkinitrd -l parameter optional > "diet" kernel module patch ever make it into the trunk? No, not yet. I didn't forget it but had no time to reallly think about it. I'm still not sure how to use it best. With the python packages. The .pyo/pyc is a point. I'll think about adding something like <package> *.py to the listfiles listing python rpms. That means only files with .py in the end will be included. And no I don't think it works negative. ;-) Regards Marc. > > Thanks. > > Gordan -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Gordan B. <go...@bo...> - 2009-02-05 22:26:51
|
It would appear that OSR doesn't start up MD software RAID arrays. Well, now it does. Patches attached. :) It fires up any MD arrays that can be assembled right after the DM arrays. I'm not sure if this should be made optional (only do it if mdadm is present, which will only be the case of the mdadm.list files are added to the initrd. Any thoughts on this? On an unrelated note, mkinitrd seems to put all the .pyc and .pyo files into the initrd. I only found this out because I wiped them while troubleshooting the cdslDefaults_element issue, since it complains those files are not there. Are these really necessary? I would have thought those non-essential, and anything making the initrd smaller is probably a good thing... Speaking of which, did my mkinitrd -l parameter optional "diet" kernel module patch ever make it into the trunk? Thanks. Gordan |
From: Marc G. <gr...@at...> - 2009-02-05 06:38:31
|
On Thursday 05 February 2009 00:10:08 Gordan Bobic wrote: > I seem to remember running into this problem before, but I can't > remember what the fix was. Can anyone jog my memory? > > # com-mkcdslinfrastructure -r /mnt/newroot -i > Traceback (most recent call last): > File "/usr/bin/com-mkcdslinfrastructure", line 17, in ? > from comoonics.cdsl.ComCdslRepository import * > File "/usr/lib/python2.4/site-packages/comoonics/cdsl/__init__.py", > line 28, in ? > defaults_path = os.path.join(cdsls_path,cdslDefaults_element) > NameError: name 'cdslDefaults_element' is not defined Just replace the cdslDefaults_element in this line with cdsl_element and there you go. Sorry but this is fixed in the next versions. Marc. > > Thanks. > > Gordan > > --------------------------------------------------------------------------- >--- Create and Deploy Rich Internet Apps outside the browser with > Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing > skills and code to build responsive, highly engaging applications that > combine the power of local resources and data with the reach of the web. > Download the Adobe AIR SDK and Ajax docs to start building applications > today-http://p.sf.net/sfu/adobe-com > _______________________________________________ > Open-sharedroot-devel mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-devel -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Gordan B. <go...@bo...> - 2009-02-04 23:10:17
|
I seem to remember running into this problem before, but I can't remember what the fix was. Can anyone jog my memory? # com-mkcdslinfrastructure -r /mnt/newroot -i Traceback (most recent call last): File "/usr/bin/com-mkcdslinfrastructure", line 17, in ? from comoonics.cdsl.ComCdslRepository import * File "/usr/lib/python2.4/site-packages/comoonics/cdsl/__init__.py", line 28, in ? defaults_path = os.path.join(cdsls_path,cdslDefaults_element) NameError: name 'cdslDefaults_element' is not defined Thanks. Gordan |
From: Dan M. <dan...@or...> - 2009-02-03 14:48:28
|
Hello -- I am a Xen developer and I would like to experiment with open-sharedroot on rhel5+ocfs2 on Xen but I am having some network problems and have not been able to download the 3.6G DVD iso. Is there any way I can download just the necessary .rpm files? (i386 only is OK for now) Also, I have ocfs2 v1.4.1-1, not 1.3.9-0.1. Will the specific versions of the rpms listed in the how-to work also with 1.4.1 or will I need newer versions of the coomonics rpms? And is the rpm listed in the document (and below) as "el6" a typo? Thanks, Dan List of rpms from the how-to: # comoonics-pythonosfix-py-0.1-1 # comoonics-bootimage-listfiles-1.3-6.el5 # SysVinit-comoonics-2.86-14.atix.1 # comoonics-cluster-py-0.1-12 # comoonics-cdsl-py-0.2-11 # comoonics-bootimage-1.3-32 # comoonics-release-0.1-1 # comoonics-cs-py-0.1-54 # comoonics-bootimage-initscripts-1.3-5.el6 # comoonics-bootimage-extras-ocfs2-0.1-1 # comoonics-bootimage-extras-xen-0.1-3 (Only needed for xen Guest) |
From: Gordan B. <go...@bo...> - 2009-01-29 17:42:13
|
On Thu, 29 Jan 2009 18:03:17 +0100, Marc Grimme <gr...@at...> wrote: >> > Note: The first triggering of udev is just to cause the modules to be >> > loaded once to save all MAC addresses. Then they are unloaded again. >> > >> > But in order to detect the nics I can also change the process like as >> > follows: >> > 1. load drivers as specified in modprobe.conf >> > 2. Save the drivers >> > 3. Trigger udev >> > 4. Save macs >> > 5. Unload all newly loaded drivers >> > But I think this wouldn't be too universal. As It does not really work >> > stable for currupt modprobe.confs. >> >> Is "it must work with broken/corrupt modprobe.conf" really a reasonable >> requirement? > > I've seen it not only once. Case 1: mixed hardware. Case 2: Cloned clusters > forgot to change the modprobe.conf. Of course - I have done it more than once myself. :^) Anyway, as I already said, I'm now fully convinced that your original idea (@driver) is the best solution. Sorry I doubted it. :) > I don't know if this is reasonable. I would say it's a positive side effect > of better supporting mixed hardware and that's the real reason. > > There are customers who are using our initrd to being able to boot guest on > real physical hardware and vice versa if need be. Then that @driver concept > is quite a nice thing to have. Agreed. >> > The other way would do the same and >> > additionally would also work with corrupt modprobe.confs and @driver in >> > the cluster.conf. >> >> What happens when @driver and modprobe.conf contradict each other? Which >> takes precedence? > > @driver. If you leave it out it won't be used. So everything works as > before but you can add such a thing. OK. What about making @driver mandatory? It would mean there is less scope for a mistake? Perhaps for now backward compatibility is important for those who blindly "yum update" and expect everything to still work, but I think mkinitrd should at least throw a warning if @driver isn't present, saying that non-use of it is deprecated? >> > I still think this is a general way which is quite stable and most >> > universal. >> >> I agree that it should be stable, but I'm still in two minds about >> supporting the case of corrupt modprobe.conf. I can see that different >> cluster nodes could have different hardware and thus have different >> modprobe.conf-s, which means that there are only two options: >> >> 1) Specify drivers in cluster.conf and ignore modprobe.conf completely. >> Note that this also applies to storage drivers - if the nodes are >> different >> they could also have different disk controllers (very relevant for >> shared-SCSI-bus, DRBD and GlusterFS solutions), which would also cause >> similar problems. >> >> 2) Load each node's modprobe.conf (hopefully /cdsl.local is mounted off >> the shared file system and not a local partition on each disk - not a >> requirement (not enforced at least) at the moment!) into the initrd and >> analyze at runtime based on which node we are running on. The >> scsi_controller drivers would have to be loaded after the NICs are set up >> since until we get MACs we don't know which node we're running on. >> >> I can see advantages to both approaches. 1) is more elegant from the >> implementation point of view since we only have to configure storage and >> NICs for all nodes in one file. 2) is more elegant because there are no >> redundant configuration entries between cluster.conf and modprobe.conf. >> Having said that, we need at least some of the NIC setup in cluster.conf, >> so that already makes the configuration redundancy necessary anyway. >> >> OK, I think I'm convinced - maybe it would be better to ignore >> modprobe.conf alltogether. >> >> Does that mean something similar is required for disk controller drivers >> in cluster.conf? :) > > I knew you would come over with something like this and the answer is yes > but not know. Sorry. :-( I didn't mean to be difficult, just trying to cover an extra base that seemed like a logical extension. > I want to implement and see the @driver scenario then we can easily add the > same thing for storage. But there you normally don't have that order > problem. > But still I think it's a good idea to later have it there too. Great, thanks for clearing it up. Please post when the updated package is in preview and I'll test it on the cluster that I found to be affected by the issue. :) Gordan |
From: Marc G. <gr...@at...> - 2009-01-29 17:03:33
|
On Thursday 29 January 2009 17:30:49 Gordan Bobic wrote: > On Thu, 29 Jan 2009 16:39:30 +0100, Marc Grimme <gr...@at...> wrote: > > [...] > > > Note: The first triggering of udev is just to cause the modules to be > > loaded once to save all MAC addresses. Then they are unloaded again. > > > > But in order to detect the nics I can also change the process like as > > follows: > > 1. load drivers as specified in modprobe.conf > > 2. Save the drivers > > 3. Trigger udev > > 4. Save macs > > 5. Unload all newly loaded drivers > > But I think this wouldn't be too universal. As It does not really work > > stable for currupt modprobe.confs. > > Is "it must work with broken/corrupt modprobe.conf" really a reasonable > requirement? I've seen it not only once. Case 1: mixed hardware. Case 2: Cloned clusters forgot to change the modprobe.conf. I don't know if this is reasonable. I would say it's a positive side effect of better supporting mixed hardware and that's the real reason. There are customers who are using our initrd to being able to boot guest on real physical hardware and vice versa if need be. Then that @driver concept is quite a nice thing to have. > > > The other way would do the same and > > additionally would also work with corrupt modprobe.confs and @driver in > > the > > > cluster.conf. > > What happens when @driver and modprobe.conf contradict each other? Which > takes precedence? @driver. If you leave it out it won't be used. So everything works as before but you can add such a thing. > > > I still think this is a general way which is quite stable and most > > universal. > > I agree that it should be stable, but I'm still in two minds about > supporting the case of corrupt modprobe.conf. I can see that different > cluster nodes could have different hardware and thus have different > modprobe.conf-s, which means that there are only two options: > > 1) Specify drivers in cluster.conf and ignore modprobe.conf completely. > Note that this also applies to storage drivers - if the nodes are different > they could also have different disk controllers (very relevant for > shared-SCSI-bus, DRBD and GlusterFS solutions), which would also cause > similar problems. > > 2) Load each node's modprobe.conf (hopefully /cdsl.local is mounted off the > shared file system and not a local partition on each disk - not a > requirement (not enforced at least) at the moment!) into the initrd and > analyze at runtime based on which node we are running on. The > scsi_controller drivers would have to be loaded after the NICs are set up > since until we get MACs we don't know which node we're running on. > > I can see advantages to both approaches. 1) is more elegant from the > implementation point of view since we only have to configure storage and > NICs for all nodes in one file. 2) is more elegant because there are no > redundant configuration entries between cluster.conf and modprobe.conf. > Having said that, we need at least some of the NIC setup in cluster.conf, > so that already makes the configuration redundancy necessary anyway. > > OK, I think I'm convinced - maybe it would be better to ignore > modprobe.conf alltogether. > > Does that mean something similar is required for disk controller drivers in > cluster.conf? :) I knew you would come over with something like this and the answer is yes but not know. I want to implement and see the @driver scenario then we can easily add the same thing for storage. But there you normally don't have that order problem. But still I think it's a good idea to later have it there too. Marc. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Gordan B. <go...@bo...> - 2009-01-29 16:30:58
|
On Thu, 29 Jan 2009 16:39:30 +0100, Marc Grimme <gr...@at...> wrote: [...] > Note: The first triggering of udev is just to cause the modules to be > loaded once to save all MAC addresses. Then they are unloaded again. > > But in order to detect the nics I can also change the process like as > follows: > 1. load drivers as specified in modprobe.conf > 2. Save the drivers > 3. Trigger udev > 4. Save macs > 5. Unload all newly loaded drivers > But I think this wouldn't be too universal. As It does not really work > stable for currupt modprobe.confs. Is "it must work with broken/corrupt modprobe.conf" really a reasonable requirement? > The other way would do the same and > additionally would also work with corrupt modprobe.confs and @driver in the > cluster.conf. What happens when @driver and modprobe.conf contradict each other? Which takes precedence? > I still think this is a general way which is quite stable and most > universal. I agree that it should be stable, but I'm still in two minds about supporting the case of corrupt modprobe.conf. I can see that different cluster nodes could have different hardware and thus have different modprobe.conf-s, which means that there are only two options: 1) Specify drivers in cluster.conf and ignore modprobe.conf completely. Note that this also applies to storage drivers - if the nodes are different they could also have different disk controllers (very relevant for shared-SCSI-bus, DRBD and GlusterFS solutions), which would also cause similar problems. 2) Load each node's modprobe.conf (hopefully /cdsl.local is mounted off the shared file system and not a local partition on each disk - not a requirement (not enforced at least) at the moment!) into the initrd and analyze at runtime based on which node we are running on. The scsi_controller drivers would have to be loaded after the NICs are set up since until we get MACs we don't know which node we're running on. I can see advantages to both approaches. 1) is more elegant from the implementation point of view since we only have to configure storage and NICs for all nodes in one file. 2) is more elegant because there are no redundant configuration entries between cluster.conf and modprobe.conf. Having said that, we need at least some of the NIC setup in cluster.conf, so that already makes the configuration redundancy necessary anyway. OK, I think I'm convinced - maybe it would be better to ignore modprobe.conf alltogether. Does that mean something similar is required for disk controller drivers in cluster.conf? :) Gordan |
From: Marc G. <gr...@at...> - 2009-01-29 15:39:44
|
Hi Gordan, On Thursday 29 January 2009 12:21:52 Gordan Bobic wrote: > Hi, > > Replying here because I thought it was too discussiony for a bugzilla > comment. I see ;-) > > # A new attribute driver per nic per clusternode was introduced. > # <eth name=".." mac=".." driver=".."/> > > I can see that this is useful for heterogenous clusters, but if "driver" > isn't specified, the NIC driver "probing" shouldn't really occur at all. It > should be done according to the content of modprobe.conf. I think this > should be deemed authoritative unless specifically overriden by the driver > parameter in the NIC spec. Yes an no. See below. > > Also, what happens if we have an alternating NIC driver setup, e.g. > > eth0 e1000 > eth1 e100 > eth2 e1000 > > Will this work correctly, or will loading the e1000 driver wrongly make the > two e1000 NICs eth0 and eth1? Yes I think it will. See below. > > If udev configuration is dynamically generated from cluster.conf by MAC > address using a line like: > KERNEL=="eth*", SYSFS{address}=="00:11:22:33:44:55", NAME="eth0" > that should probably suffice. > Unfortunately, AFAIK this is not redundant with modprobe.conf stuff (need > the driver loaded before we can read the MAC). Still, I feel there is a > strong argument for making modprobe.conf the default. > > Or, as a potentially easier-to-implement alternative, maybe it would be > better to make the driver parameter mandatory (assuming it isn't at the > moment) and abort mkinitrd if it isn't provided. What we do is start udevd then implicitly let it autoload the mods. Save the MACs (note the first udevtrigger is only used to detect the MAC). Then the modules are unloaded again. And now they are (newly) loaded in the following order. 1. If a driver is defined in cluster.conf (eth@driver) it has precedence 2. Load the eth* which are defined in modprobe.conf 3. Trigger udev This should still be stable. Note: The first triggering of udev is just to cause the modules to be loaded once to save all MAC addresses. Then they are unloaded again. But in order to detect the nics I can also change the process like as follows: 1. load drivers as specified in modprobe.conf 2. Save the drivers 3. Trigger udev 4. Save macs 5. Unload all newly loaded drivers But I think this wouldn't be too universal. As It does not really work stable for currupt modprobe.confs. The other way would do the same and additionally would also work with corrupt modprobe.confs and @driver in the cluster.conf. I still think this is a general way which is quite stable and most universal. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Gordan B. <go...@bo...> - 2009-01-29 11:22:00
|
Hi, Replying here because I thought it was too discussiony for a bugzilla comment. # A new attribute driver per nic per clusternode was introduced. # <eth name=".." mac=".." driver=".."/> I can see that this is useful for heterogenous clusters, but if "driver" isn't specified, the NIC driver "probing" shouldn't really occur at all. It should be done according to the content of modprobe.conf. I think this should be deemed authoritative unless specifically overriden by the driver parameter in the NIC spec. Also, what happens if we have an alternating NIC driver setup, e.g. eth0 e1000 eth1 e100 eth2 e1000 Will this work correctly, or will loading the e1000 driver wrongly make the two e1000 NICs eth0 and eth1? If udev configuration is dynamically generated from cluster.conf by MAC address using a line like: KERNEL=="eth*", SYSFS{address}=="00:11:22:33:44:55", NAME="eth0" that should probably suffice. Unfortunately, AFAIK this is not redundant with modprobe.conf stuff (need the driver loaded before we can read the MAC). Still, I feel there is a strong argument for making modprobe.conf the default. Or, as a potentially easier-to-implement alternative, maybe it would be better to make the driver parameter mandatory (assuming it isn't at the moment) and abort mkinitrd if it isn't provided. Gordan On Thu, 29 Jan 2009 11:21:07 +0100, Marc Grimme <gr...@at...> wrote: > Hi, > I opened a bug for this problem > https://bugzilla.atix.de/show_bug.cgi?id=325 > > I will describe/discuss my findings there. > > I think I have a solution already. > > Gordan, you might want to add you to this Bug. > > Regards Marc. > > BTW: I would not use the current comoonics-bootimage from preview for > clusters > with multiple nics (means ones with different drivers)! > Wait a day or two and I'll come up with a new version. > On Wednesday 21 January 2009 16:52:44 Gordan Bobic wrote: >> On Wed, 21 Jan 2009 13:19:45 +0100, Marc Grimme <gr...@at...> wrote: >> >> It would appear that >> >> /opt/atix/comoonics-bootimage/boot-scripts/etc/rhel5/hardware-lib.sh >> >> has >> >> gone through a few changes in the past few months, which, >> >> unfortunately, >> >> break it for me. >> >> >> >> The problem is in the ordering of the detected NICs. On one of my >> >> systems I have a dual e1000 built into the mobo, and an e100 as an >> >> add-in card. /etc/modprobe.conf lists eth0 and eth1 as the e1000s, and >> >> eth2 as e100. This works fine with hardware-lib.sh v1.5, but with v1.7 >> >> the ordering seems to be both unstable (about 1/10 of the time it'll >> >> actually get the NIC ordering as expected and specified in >> >> cluster.conf >> >> and the rest of the time it'll do something different) and >> >> inconsistent >> >> with what is in cluster.conf and modprobe.conf. >> > >> > That's strange. I have the same problems on one cluster like you >> > describe >> > it. >> > One time everything works and the other time it doesn't. But all other >> > clusters work. >> > >> > The reason why I changed the hw detection for rhel5 is because it >> > didn't >> > work >> > for VMs (especially kvm) and I didn't find any problems on all the >> > other >> > clusters (except for the one me and the one from you). >> > >> > I think I have to look deeper into that matter. >> >> I made a rudimentary attempt at rectifying it by explicitly sorting the >> module list, but that didn't fix it. The problem is that the eth* binding >> ends up being done in the order the drivers are loaded (i.e. if I load >> the >> e100 driver before the e1000 driver, e100 ends up being eth0). This seems >> to override and ignore any settings listed in modprobe.conf, and more >> disturbingly, it seems to ignore the by-MAC bindings in cluster.conf >> which >> should really have the highest precedence (but either way they should >> agree >> with modprobe.conf if everything is set up right). >> >> > So what you say is if you just change hardware-lib.sh from 1.7 to 1.5 >> > everything works fine? >> >> Yes. Note, however, that it could just be that the failure in 1.5 is >> always >> consistent with my hardware so it always comes up the right way around. >> 1.7, however, definitely doesn't come up right, and more importantly, it >> doesn't come up consistently. >> >> > Cause I thought it was due to the order (that's what I've changed) of >> >> udevd >> >> > and kudzu/modprobe eth* being called. Older versions first called kudzu >> > then probed for the nics and then started udevd. >> > >> > Now I'm first starting udevd then - if appropriate - kudzu and then >> > probe >> > for >> > the NICs. I always thought that it was because of the order. But if the >> >> new >> >> > order works with hardware-lib.sh (v1.5) but not for 1.7 it isn't >> > because >> >> of >> >> > the order. As the order is defined by linuxrc.generic.sh. >> > >> > Can you acknowledge that it's only the version of hardware-lib.sh? >> >> Yes, it's the only file I copied across from the older package. Note, >> however, the caveat above - it could just be that it makes things work on >> this one system where I observed it. In other words, just because 1.5 >> makes >> it work doesn't mean that the bug is in hardware-lib.sh. It could just be >> covering up a problem elsewhere. It could be some kind of a weird kudzu >> problem, too - I've found it to be unreliable and break things in the >> past, >> albeit not recently (having said that, it's the first thing I switch off >> on >> a new system, so maybe I just didn't notice before). >> >> >> The last version that works for me is v1.5, and the latest released >> >> version (I'm talking about CVS version numbers here) appears to be >> >> v1.7 >> >> for this file (in the comoonics-bootimage-1.3-40.noarch.rpm release). >> >> >> >> Needless to say, trying to boot off an iSCSI shared root with the NIC >> >> not starting because eth designation doesn't match the MAC doesn't get >> >> very far. :-/ >> > >> > Very needless. It's the same for non iscsi clusters ;-) . So this needs >> >> to >> >> > be fixed. >> >> Indeed. DRBD is even worse, as it has extra scope for split-brain, >> particularly if IP addresses are fail-over resources and they happen to >> live on an interface that does end up coming up correctly. >> >> > Thanks and sorry about that ugly bug. >> >> The fact that you observed it, too, is rather a relief, actually. It took >> me a fair while and a number of initrd rebuilds and a bit of digging to >> make sure that I was seeing what I _thought_ I was seeing, and not a >> weird >> side-effect of something I'd done to the configuration. Please, do post >> when you have a fix. :-) >> >> Gordan >> >> --------------------------------------------------------------------------- >>--- This SF.net email is sponsored by: >> SourcForge Community >> SourceForge wants to tell your story. >> http://p.sf.net/sfu/sf-spreadtheword >> _______________________________________________ >> Open-sharedroot-devel mailing list >> Ope...@li... >> https://lists.sourceforge.net/lists/listinfo/open-sharedroot-devel |
From: Marc G. <gr...@at...> - 2009-01-29 10:21:24
|
Hi, I opened a bug for this problem https://bugzilla.atix.de/show_bug.cgi?id=325 I will describe/discuss my findings there. I think I have a solution already. Gordan, you might want to add you to this Bug. Regards Marc. BTW: I would not use the current comoonics-bootimage from preview for clusters with multiple nics (means ones with different drivers)! Wait a day or two and I'll come up with a new version. On Wednesday 21 January 2009 16:52:44 Gordan Bobic wrote: > On Wed, 21 Jan 2009 13:19:45 +0100, Marc Grimme <gr...@at...> wrote: > >> It would appear that > >> /opt/atix/comoonics-bootimage/boot-scripts/etc/rhel5/hardware-lib.sh has > >> gone through a few changes in the past few months, which, unfortunately, > >> break it for me. > >> > >> The problem is in the ordering of the detected NICs. On one of my > >> systems I have a dual e1000 built into the mobo, and an e100 as an > >> add-in card. /etc/modprobe.conf lists eth0 and eth1 as the e1000s, and > >> eth2 as e100. This works fine with hardware-lib.sh v1.5, but with v1.7 > >> the ordering seems to be both unstable (about 1/10 of the time it'll > >> actually get the NIC ordering as expected and specified in cluster.conf > >> and the rest of the time it'll do something different) and inconsistent > >> with what is in cluster.conf and modprobe.conf. > > > > That's strange. I have the same problems on one cluster like you describe > > it. > > One time everything works and the other time it doesn't. But all other > > clusters work. > > > > The reason why I changed the hw detection for rhel5 is because it didn't > > work > > for VMs (especially kvm) and I didn't find any problems on all the other > > clusters (except for the one me and the one from you). > > > > I think I have to look deeper into that matter. > > I made a rudimentary attempt at rectifying it by explicitly sorting the > module list, but that didn't fix it. The problem is that the eth* binding > ends up being done in the order the drivers are loaded (i.e. if I load the > e100 driver before the e1000 driver, e100 ends up being eth0). This seems > to override and ignore any settings listed in modprobe.conf, and more > disturbingly, it seems to ignore the by-MAC bindings in cluster.conf which > should really have the highest precedence (but either way they should agree > with modprobe.conf if everything is set up right). > > > So what you say is if you just change hardware-lib.sh from 1.7 to 1.5 > > everything works fine? > > Yes. Note, however, that it could just be that the failure in 1.5 is always > consistent with my hardware so it always comes up the right way around. > 1.7, however, definitely doesn't come up right, and more importantly, it > doesn't come up consistently. > > > Cause I thought it was due to the order (that's what I've changed) of > > udevd > > > and kudzu/modprobe eth* being called. Older versions first called kudzu > > then probed for the nics and then started udevd. > > > > Now I'm first starting udevd then - if appropriate - kudzu and then probe > > for > > the NICs. I always thought that it was because of the order. But if the > > new > > > order works with hardware-lib.sh (v1.5) but not for 1.7 it isn't because > > of > > > the order. As the order is defined by linuxrc.generic.sh. > > > > Can you acknowledge that it's only the version of hardware-lib.sh? > > Yes, it's the only file I copied across from the older package. Note, > however, the caveat above - it could just be that it makes things work on > this one system where I observed it. In other words, just because 1.5 makes > it work doesn't mean that the bug is in hardware-lib.sh. It could just be > covering up a problem elsewhere. It could be some kind of a weird kudzu > problem, too - I've found it to be unreliable and break things in the past, > albeit not recently (having said that, it's the first thing I switch off on > a new system, so maybe I just didn't notice before). > > >> The last version that works for me is v1.5, and the latest released > >> version (I'm talking about CVS version numbers here) appears to be v1.7 > >> for this file (in the comoonics-bootimage-1.3-40.noarch.rpm release). > >> > >> Needless to say, trying to boot off an iSCSI shared root with the NIC > >> not starting because eth designation doesn't match the MAC doesn't get > >> very far. :-/ > > > > Very needless. It's the same for non iscsi clusters ;-) . So this needs > > to > > > be fixed. > > Indeed. DRBD is even worse, as it has extra scope for split-brain, > particularly if IP addresses are fail-over resources and they happen to > live on an interface that does end up coming up correctly. > > > Thanks and sorry about that ugly bug. > > The fact that you observed it, too, is rather a relief, actually. It took > me a fair while and a number of initrd rebuilds and a bit of digging to > make sure that I was seeing what I _thought_ I was seeing, and not a weird > side-effect of something I'd done to the configuration. Please, do post > when you have a fix. :-) > > Gordan > > --------------------------------------------------------------------------- >--- This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > Open-sharedroot-devel mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-devel -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Gordan B. <go...@bo...> - 2009-01-27 17:14:23
|
Marc, Thanks for the workaround. I didn't get bitten by this, but it was close - I was planning to update some of my clusters in the near future. I think I'll now wait until this is fixed. Gordan On Tue, 27 Jan 2009 17:34:06 +0100, Marc Grimme <gr...@at...> wrote: > Gordan, > we are aware of this bug already. If it is a bug we won't change anything > and if not we'll detect u3 and then remove the default mountopts > noatime,nodiratime. > > In the meantime you can overwrite the defaultmountopts. But yes, this is > serious for us. > > Regards Marc. > On Tuesday 27 January 2009 16:24:31 Gordan Bobic wrote: >> On Tue, 27 Jan 2009 15:48:42 +0100, denis <den...@gm...> wrote: >> > Bob Peterson wrote: >> >> | It turns out that GFS no longer accepts noatime or noquota? Removing >> >> | these mountoptions and I could again mount my GFS volume. Which one >> >> | is >> >> | now deprecated and why? >> >> >> >> This sounds like a bug. Can you open a bugzilla record for it? >> >> AFAIK, it was not our intent to remove those mount options. >> >> That's potentially quite a serious bug for those of us using GFS for the >> root fs, as it renders the cluster unbootable. Without being aware of it >> in >> advance it could lead to a whole world of pain. Thanks for reporting >> this, >> Denis. >> >> Gordan >> >> -- >> Linux-cluster mailing list >> Lin...@re... >> https://www.redhat.com/mailman/listinfo/linux-cluster |
From: Marc G. <gr...@at...> - 2009-01-27 16:34:22
|
Gordan, we are aware of this bug already. If it is a bug we won't change anything and if not we'll detect u3 and then remove the default mountopts noatime,nodiratime. In the meantime you can overwrite the defaultmountopts. But yes, this is serious for us. Regards Marc. On Tuesday 27 January 2009 16:24:31 Gordan Bobic wrote: > On Tue, 27 Jan 2009 15:48:42 +0100, denis <den...@gm...> wrote: > > Bob Peterson wrote: > >> | It turns out that GFS no longer accepts noatime or noquota? Removing > >> | these mountoptions and I could again mount my GFS volume. Which one > >> | is > >> | now deprecated and why? > >> > >> This sounds like a bug. Can you open a bugzilla record for it? > >> AFAIK, it was not our intent to remove those mount options. > > That's potentially quite a serious bug for those of us using GFS for the > root fs, as it renders the cluster unbootable. Without being aware of it in > advance it could lead to a whole world of pain. Thanks for reporting this, > Denis. > > Gordan > > -- > Linux-cluster mailing list > Lin...@re... > https://www.redhat.com/mailman/listinfo/linux-cluster -- Gruss / Regards, Marc Grimme Phone: +49-89 452 3538-14 http://www.atix.de/ http://www.open-sharedroot.org/ ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 | 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org ------------------------------------------------------------ *** Besuchen Sie uns auf dem ATIX IT Solution Day: Linux Cluster-Technolgien, am 05.02.2009 in Neuss b. Koeln/Duesseldorf! www.atix.de/event-archiv/atix-it-solution-day-linux-neuss *** ------------------------------------------------------------ Registergericht: Amtsgericht Muenchen, Registernummer: HRB 168930, USt.-Id.: DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas Merz (Vors.) | Vorsitzender des Aufsichtsrats: Dr. Martin Buss |
From: Gordan B. <go...@bo...> - 2009-01-21 15:53:01
|
On Wed, 21 Jan 2009 13:19:45 +0100, Marc Grimme <gr...@at...> wrote: >> It would appear that >> /opt/atix/comoonics-bootimage/boot-scripts/etc/rhel5/hardware-lib.sh has >> gone through a few changes in the past few months, which, unfortunately, >> break it for me. >> >> The problem is in the ordering of the detected NICs. On one of my >> systems I have a dual e1000 built into the mobo, and an e100 as an >> add-in card. /etc/modprobe.conf lists eth0 and eth1 as the e1000s, and >> eth2 as e100. This works fine with hardware-lib.sh v1.5, but with v1.7 >> the ordering seems to be both unstable (about 1/10 of the time it'll >> actually get the NIC ordering as expected and specified in cluster.conf >> and the rest of the time it'll do something different) and inconsistent >> with what is in cluster.conf and modprobe.conf. > That's strange. I have the same problems on one cluster like you describe > it. > One time everything works and the other time it doesn't. But all other > clusters work. > > The reason why I changed the hw detection for rhel5 is because it didn't > work > for VMs (especially kvm) and I didn't find any problems on all the other > clusters (except for the one me and the one from you). > > I think I have to look deeper into that matter. I made a rudimentary attempt at rectifying it by explicitly sorting the module list, but that didn't fix it. The problem is that the eth* binding ends up being done in the order the drivers are loaded (i.e. if I load the e100 driver before the e1000 driver, e100 ends up being eth0). This seems to override and ignore any settings listed in modprobe.conf, and more disturbingly, it seems to ignore the by-MAC bindings in cluster.conf which should really have the highest precedence (but either way they should agree with modprobe.conf if everything is set up right). > So what you say is if you just change hardware-lib.sh from 1.7 to 1.5 > everything works fine? Yes. Note, however, that it could just be that the failure in 1.5 is always consistent with my hardware so it always comes up the right way around. 1.7, however, definitely doesn't come up right, and more importantly, it doesn't come up consistently. > Cause I thought it was due to the order (that's what I've changed) of udevd > and kudzu/modprobe eth* being called. Older versions first called kudzu > then probed for the nics and then started udevd. > > Now I'm first starting udevd then - if appropriate - kudzu and then probe > for > the NICs. I always thought that it was because of the order. But if the new > > order works with hardware-lib.sh (v1.5) but not for 1.7 it isn't because of > > the order. As the order is defined by linuxrc.generic.sh. > > Can you acknowledge that it's only the version of hardware-lib.sh? Yes, it's the only file I copied across from the older package. Note, however, the caveat above - it could just be that it makes things work on this one system where I observed it. In other words, just because 1.5 makes it work doesn't mean that the bug is in hardware-lib.sh. It could just be covering up a problem elsewhere. It could be some kind of a weird kudzu problem, too - I've found it to be unreliable and break things in the past, albeit not recently (having said that, it's the first thing I switch off on a new system, so maybe I just didn't notice before). >> The last version that works for me is v1.5, and the latest released >> version (I'm talking about CVS version numbers here) appears to be v1.7 >> for this file (in the comoonics-bootimage-1.3-40.noarch.rpm release). >> >> Needless to say, trying to boot off an iSCSI shared root with the NIC >> not starting because eth designation doesn't match the MAC doesn't get >> very far. :-/ > > Very needless. It's the same for non iscsi clusters ;-) . So this needs to > be fixed. Indeed. DRBD is even worse, as it has extra scope for split-brain, particularly if IP addresses are fail-over resources and they happen to live on an interface that does end up coming up correctly. > Thanks and sorry about that ugly bug. The fact that you observed it, too, is rather a relief, actually. It took me a fair while and a number of initrd rebuilds and a bit of digging to make sure that I was seeing what I _thought_ I was seeing, and not a weird side-effect of something I'd done to the configuration. Please, do post when you have a fix. :-) Gordan |