You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(18) |
Dec
(7) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(21) |
Feb
(8) |
Mar
(8) |
Apr
(7) |
May
(1) |
Jun
(1) |
Jul
(5) |
Aug
(42) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(8) |
Sep
(13) |
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Jon A. H. <jo...@gm...> - 2011-09-03 01:22:24
|
Hi Eray, Sorry for answering late, I have just came back from the U.S.A. 2011/8/31 Eray Ozkural <exa...@gm...>: > Hi there, > > I got a real cluster (not virtual) working on debian squeeze but all of my > efforts to get ubuntu 11.04 to work failed. I see, I also failed to make it work, but also I couldn't spend much time on it. This week I'll be out on a congress but afterwards I will try to fix it. > I couldn't even debug the > braindead upstart scripts on ubuntu so I gave up. Yes, I know how those scripts are... :-S > It's not professional that > way. There were still some weird issues that I had to track down on debian > but I solved them all (I could because it's an orderly distro), I'll try and > release them later, I just wanted to let you know that you shouldn't lose > time with the latest ubuntu, I couldn't really run it properly despite a lot > of hacking in vain, so be warned. Debian saves the day again :) Yes, IMHO Debian works much better. > Cheers, > > PS: On ubuntu I even had to change dracut a bit to make the boot work but > upstart eventually hung on ubuntu (on one version it worked but after I > gained confidence and made an upgrade it started freezing etc. bottom line: > it's not stable and well tested), but on debian no such drastic changes were > necessary, just fiddling with the scripts and the root image a bit. As a > result, I've obtained the best cluster setup I've ever made so I'm quite > happy with it. Thanks everyone and keep it cool! Thanks!! I'll continue the development next week and of course there still much room for improvements :-). For example the customizations should be easier to do in the future than how they are done now, and of course that will let fixing issues more easily or develop customized modules. That is the big change in that should be in KestrelHPC 2.2. I've ported all the scripts/templates to the new system but I have been to busy in the last 2 months to finnish the functions which handle this new type of templates/scripts (I'm very sorry for this...). Thanks to you Eray for your comments, and for using our small project! Regards, JonAn. |
From: Eray O. <exa...@gm...> - 2011-08-31 02:04:53
|
Hi there, I got a real cluster (not virtual) working on debian squeeze but all of my efforts to get ubuntu 11.04 to work failed. I couldn't even debug the braindead upstart scripts on ubuntu so I gave up. It's not professional that way. There were still some weird issues that I had to track down on debian but I solved them all (I could because it's an orderly distro), I'll try and release them later, I just wanted to let you know that you shouldn't lose time with the latest ubuntu, I couldn't really run it properly despite a lot of hacking in vain, so be warned. Debian saves the day again :) Cheers, PS: On ubuntu I even had to change dracut a bit to make the boot work but upstart eventually hung on ubuntu (on one version it worked but after I gained confidence and made an upgrade it started freezing etc. bottom line: it's not stable and well tested), but on debian no such drastic changes were necessary, just fiddling with the scripts and the root image a bit. As a result, I've obtained the best cluster setup I've ever made so I'm quite happy with it. Thanks everyone and keep it cool! -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy |
From: Eray O. <exa...@gm...> - 2011-08-20 12:59:07
|
I've made sure that I've fixed all the NFS related bugs in dracut/kestrel, now I can mount both nfs v3 and v4 actually. I just had to fix dracut a little! Another important bug was that dracut invoked the wrong mount command in nfsroot module. After I finish my installation, I'll share the changes I've made to kestrel and dracut available. BTW, since we have an init script in kestrel, I think we should add "service idmapd start" in it because there is an Ubuntu bug that prevents it from starting on the frontend. Cheers, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct |
From: Eray O. <exa...@gm...> - 2011-08-19 17:17:47
|
Dear JonAn, On Fri, Aug 19, 2011 at 7:22 PM, Jon Ander Hernandez <jo...@gm...>wrote: > 2011/8/19 Eray Ozkural <exa...@gm...>: > > All right, my pure Ubuntu 11.04 installation adventure goes on, I've now > > managed to get a login on the slave node. > > Nice! :-) > > > I think it's time I start figuring > > out how to register nodes, because I can't login like this :) Since I've > > changed too many things I can't reconfigure, and I don't think > registering > > will work like this, I'm going to read that bit of the code and figure > out > > how it gets done, I'm going to have to give the register option manually > I > > suppose. There was a boot option "register=<name>" IIRC. I'm beginning to > > like KestrelHPC, this is actually a better approach than warewulf. > > Well the register system is pretty simple. The > register/connect/disconnect is handled by a python rpc with a plugable > system, and this way can be easily extended. So when nodes boot they > run /etc/init.d/kestrel_connect which is a python script which makes a > rpc call to the frontend. To distinguis between connect or register > events it simply reads /proc/cmdline a checks for the option > "register=<name>". > > So if you have physical access to the nodes you can manually run > /etc/init.d/kestrel_connect or modify it easily (because is really > simple). > I do have physical access. I saw that's how it happens, it just modifies the pxe boot options, and then as you say on the client it throws an rpc, but right now those rpc's freeze on my system. Have you seen such a thing? Stopping/starting the daemon on the front end didn't help. I suspect that concurrent activation of new nodes was the culprit, the register logic probably couldn't handle that, and I don't see any POST messages on /var/log/kestrel_rpc.log (IIRC, or whatever its log was) anymore. This could mean either the nodes stopped submitting requests, or that the server process doesn't work (although I think the latter because I once saw a timeout message on a node). I think I should try to boot with init=/bin/sh into the nodes and try to issue the kestrel_connect command manually. I suspect that the frontend daemon may be broken though. Anyway, this is a bug, the rpc system is too fragile. I need ipython, too :) > BTW, I have seen that KestrelHPC 2.0 is pretty broken on Ubuntu 11.04. > I'm really surprised to see that so much things broke down in this > release... :-S > Uh, just needs some testing and fixing, though of course it's notoriously difficult to test such software. Though I would personally give priority to Ubuntu, because it's the most popular system. It should "just work" on Ubuntu. We've used those terrible distros before (fedora, centos, mandriva etc.) and I swore never again to use them! Debian FTW :) The problem with most cluster toolkits I've tried was, they were error prone and not portable enough. It's important for such toolkits to have a lot of failsafe defaults and just work on a bunch of standard distros. Cheers, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct |
From: Jon A. H. <jo...@gm...> - 2011-08-19 16:23:02
|
2011/8/19 Eray Ozkural <exa...@gm...>: > All right, my pure Ubuntu 11.04 installation adventure goes on, I've now > managed to get a login on the slave node. Nice! :-) > I think it's time I start figuring > out how to register nodes, because I can't login like this :) Since I've > changed too many things I can't reconfigure, and I don't think registering > will work like this, I'm going to read that bit of the code and figure out > how it gets done, I'm going to have to give the register option manually I > suppose. There was a boot option "register=<name>" IIRC. I'm beginning to > like KestrelHPC, this is actually a better approach than warewulf. Well the register system is pretty simple. The register/connect/disconnect is handled by a python rpc with a plugable system, and this way can be easily extended. So when nodes boot they run /etc/init.d/kestrel_connect which is a python script which makes a rpc call to the frontend. To distinguis between connect or register events it simply reads /proc/cmdline a checks for the option "register=<name>". So if you have physical access to the nodes you can manually run /etc/init.d/kestrel_connect or modify it easily (because is really simple). BTW, I have seen that KestrelHPC 2.0 is pretty broken on Ubuntu 11.04. I'm really surprised to see that so much things broke down in this release... :-S Regards, JonAn. |
From: Jon A. H. <jo...@gm...> - 2011-08-19 16:16:11
|
2011/8/19 Eray Ozkural <exa...@gm...>: > Cool, it would be easier to send my patches, too. :) Yes. Well I've changed how we define the templates and the scripts in the new KestrelHPC and now it the code is unfinished and untested but It will much easier to understand and to extend that what we have now. I mean, the idea of KestrelHPC is configure the required services, install adicional packages, etc.. using a template/script system, but the problem is that since a single script does many things sometimes is gets quite hard to see what it is doing. So the new system uses a template system where each script/template edits only one file, and we add flags to the file name which help understanding what that template/script does. |
From: Jon A. H. <jo...@gm...> - 2011-08-19 16:10:38
|
2011/8/19 Eray Ozkural <exa...@gm...>: > hi there, > > i could register two nodes, and after that the connections to frontend node > started failing. now i can't even start registered nodes. they get stuck > after init, and i don't even hear the beep sound. i think the connection > fails. how can i fix/debug that? i can't get a shell so i can't see what's > going wrong on the nodes. Well... if connection fails I suppose that nodes eventually will get freezed because any non-cached file/program will became inaccesible. We have a cron job which is run every minute checking if we can start a ssh connection to the nodes, if that fails then the node is "disconnected" which means that is simply removed from the list of connected nodes (which in fact is simply the /etc/host file). One thing we can do is develop an script or a cron job which checks if the connection has failed to restart the node, but it the connection fails probably the node will get freezed. |
From: Eray O. <exa...@gm...> - 2011-08-19 15:17:29
|
hi there, i could register two nodes, and after that the connections to frontend node started failing. now i can't even start registered nodes. they get stuck after init, and i don't even hear the beep sound. i think the connection fails. how can i fix/debug that? i can't get a shell so i can't see what's going wrong on the nodes. cheers, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct |
From: Eray O. <exa...@gm...> - 2011-08-19 12:14:06
|
All right, my pure Ubuntu 11.04 installation adventure goes on, I've now managed to get a login on the slave node. I think it's time I start figuring out how to register nodes, because I can't login like this :) Since I've changed too many things I can't reconfigure, and I don't think registering will work like this, I'm going to read that bit of the code and figure out how it gets done, I'm going to have to give the register option manually I suppose. There was a boot option "register=<name>" IIRC. I'm beginning to like KestrelHPC, this is actually a better approach than warewulf. Hopefully, in the end it will save me a lot of time on the new cluster! Cheers, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct |
From: Eray O. <exa...@gm...> - 2011-08-19 11:48:54
|
Cool, it would be easier to send my patches, too. :) On Fri, Aug 19, 2011 at 7:15 AM, Jon Ander Hernandez <jo...@gm...>wrote: > I changed a lot the code from KestrelHPC 2.0 to 2.2, and I was trying > to merge the all the changes back to the trunk, but it has became a > nightmare so I have just decided to change to git. > > My current repo is on: https://github.com/jonanh/KestrelHPC > > Regards, > > JonAn. > > > ------------------------------------------------------------------------------ > Get a FREE DOWNLOAD! and learn more about uberSVN rich system, > user administration capabilities and model configuration. Take > the hassle out of deploying and managing Subversion and the > tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2 > _______________________________________________ > Kestrelhpc-developers mailing list > Kes...@li... > https://lists.sourceforge.net/lists/listinfo/kestrelhpc-developers > -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct |
From: Jon A. H. <jo...@gm...> - 2011-08-19 04:46:56
|
I changed a lot the code from KestrelHPC 2.0 to 2.2, and I was trying to merge the all the changes back to the trunk, but it has became a nightmare so I have just decided to change to git. My current repo is on: https://github.com/jonanh/KestrelHPC Regards, JonAn. |
From: Eray O. <exa...@gm...> - 2011-08-18 22:23:03
|
On Thu, Aug 18, 2011 at 10:45 PM, Jon Ander Hernandez <jo...@gm...>wrote: > 2011/8/18 Eray Ozkural <exa...@gm...>: > > On Thu, Aug 18, 2011 at 9:31 PM, Jon Ander Hernandez <jo...@gm...> > > > i can switch root, and plymouth starts, but nouveau fails, it says it > can't > > find the required driver. i'm using an nvidia card on the other computer. > i > > wonder how i would debug this without headless :) it feels too much like > > windoze. this still needs a lot of work. i'd at least need a serial > console > > and network debugging capability, the first of which can be had in dracut > > but there doesn't seem to be much in the way of the latter. > > You can blacklist nouveau by adding "blacklist nouveau" to > /var/lib/kestrel/images/<image>/etc/modprobe.d/nouveau.conf > > Hmm, that doesn't play too well with plymouth, too, it gets stuck in a text mode screen and you can't quit it either. Too bad folks didn't think of a good way to control these tools when GUI isn't working. Doesn't it have any keyboard shortcuts? > And also you can disable plymouth and show all the output messages > removing kernel options "quiet splash". I normally edit the pxelinux > configuration to don't have to change parameteres each time I boot, > but also note that this file is autogenerated, so if you update > kestrel or run kestrel-reconfigure it will be overwritten : > /var/lib/kestrel/tftpboot/<image name> > Shouldn't "text nosplash" options be sufficient? They don't seem to work for me however, is there a way to give a plymouth option, there was plymouth:debug so maybe there is something else, too? I'm just trying to get a nice clean text login. Used to be easier with debian/FAI. I still haven't checked if ssh works all right, but network does work so it shouldn't be a problem. > > Now I've tried Debian Squeeze but NFS4 still gets freezed... I'm > looking if I can get someway to output more debug info (maybe the > kernel module... or something). > Oh, you should try my dracut patch I suppose, I'll make it available in short order. Best, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct |
From: Jon A. H. <jo...@gm...> - 2011-08-18 19:45:30
|
2011/8/18 Eray Ozkural <exa...@gm...>: > On Thu, Aug 18, 2011 at 9:31 PM, Jon Ander Hernandez <jo...@gm...> > i can switch root, and plymouth starts, but nouveau fails, it says it can't > find the required driver. i'm using an nvidia card on the other computer. i > wonder how i would debug this without headless :) it feels too much like > windoze. this still needs a lot of work. i'd at least need a serial console > and network debugging capability, the first of which can be had in dracut > but there doesn't seem to be much in the way of the latter. You can blacklist nouveau by adding "blacklist nouveau" to /var/lib/kestrel/images/<image>/etc/modprobe.d/nouveau.conf And also you can disable plymouth and show all the output messages removing kernel options "quiet splash". I normally edit the pxelinux configuration to don't have to change parameteres each time I boot, but also note that this file is autogenerated, so if you update kestrel or run kestrel-reconfigure it will be overwritten : /var/lib/kestrel/tftpboot/<image name> Now I've tried Debian Squeeze but NFS4 still gets freezed... I'm looking if I can get someway to output more debug info (maybe the kernel module... or something). |
From: Eray O. <exa...@gm...> - 2011-08-18 19:12:23
|
On Thu, Aug 18, 2011 at 9:31 PM, Jon Ander Hernandez <jo...@gm...>wrote: > 2011/8/18 Eray Ozkural <exa...@gm...>: > > > I'm not above modifying Dracut I'm already hacking at it :P > > > > I seem to get stuck after I mount nfsv4 root (this time correctly). Any > > ideas? :) > > My system also get freezed when upstart starts but It maybe related to > virtualbox... I don't know... I remember that once I was able to get > to almost a working system using a real client node. > In the EEUU I only have my laptop and my girlfriend's laptop, but I > don't have a twisted rj45 cable so I can't test the boot on a real PC. > But maybe I will be able to buy a twisted cable when she returns from > class (she is starting a master on the EEUU). > > > Now I'm installing a fresh Ubuntu 11.04 and afterwards I will also > test Debian Squeeze. > i can switch root, and plymouth starts, but nouveau fails, it says it can't find the required driver. i'm using an nvidia card on the other computer. i wonder how i would debug this without headless :) it feels too much like windoze. this still needs a lot of work. i'd at least need a serial console and network debugging capability, the first of which can be had in dracut but there doesn't seem to be much in the way of the latter. i'm trying to debug the root fs with init=/bin/sh, i couldn't spot anything yet :/ everything looks dandy. i'll try installing binary nvidia drivers and see if that works. cheers, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct |
From: Jon A. H. <jo...@gm...> - 2011-08-18 18:31:28
|
2011/8/18 Eray Ozkural <exa...@gm...>: > I'm not above modifying Dracut I'm already hacking at it :P > > I seem to get stuck after I mount nfsv4 root (this time correctly). Any > ideas? :) My system also get freezed when upstart starts but It maybe related to virtualbox... I don't know... I remember that once I was able to get to almost a working system using a real client node. In the EEUU I only have my laptop and my girlfriend's laptop, but I don't have a twisted rj45 cable so I can't test the boot on a real PC. But maybe I will be able to buy a twisted cable when she returns from class (she is starting a master on the EEUU). Now I'm installing a fresh Ubuntu 11.04 and afterwards I will also test Debian Squeeze. |
From: Eray O. <exa...@gm...> - 2011-08-18 18:08:44
|
On Thu, Aug 18, 2011 at 9:07 PM, Jon Ander Hernandez <jo...@gm...>wrote: > 2011/8/18 Eray Ozkural <exa...@gm...>: > > Its developer says that status isn't an error > > > http://ubuntuguide.net/howto-fix-ureadahead-problem-after-upgrading-to-ubuntu-10-04 > > > > So, it's likely hanging somewhere else.... Now I suppose I should try to > > debug init, and take a look at that nfs-cleanup script it might be > causing a > > problem (obviously). Is that what you recommended Jon? > > Well nfs-clean script simply stops nfs services (rpc.idmapd, statd, > etc...) since they should be started by init. It is planned that in > some future some processes like udev, rpc.idmapd, etc.. will not be > stopped and they will be pointed out somehow to the init service. But > by now I think it would be cleaner to simply start rpc.idmapd using an > script than modifying Dracut itself. > I'm not above modifying Dracut I'm already hacking at it :P I seem to get stuck after I mount nfsv4 root (this time correctly). Any ideas? :) Cheers, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct |
From: Jon A. H. <jo...@gm...> - 2011-08-18 18:07:12
|
2011/8/18 Eray Ozkural <exa...@gm...>: > Its developer says that status isn't an error > http://ubuntuguide.net/howto-fix-ureadahead-problem-after-upgrading-to-ubuntu-10-04 > > So, it's likely hanging somewhere else.... Now I suppose I should try to > debug init, and take a look at that nfs-cleanup script it might be causing a > problem (obviously). Is that what you recommended Jon? Well nfs-clean script simply stops nfs services (rpc.idmapd, statd, etc...) since they should be started by init. It is planned that in some future some processes like udev, rpc.idmapd, etc.. will not be stopped and they will be pointed out somehow to the init service. But by now I think it would be cleaner to simply start rpc.idmapd using an script than modifying Dracut itself. |
From: Eray O. <exa...@gm...> - 2011-08-18 17:57:59
|
Its developer says that status isn't an error http://ubuntuguide.net/howto-fix-ureadahead-problem-after-upgrading-to-ubuntu-10-04 So, it's likely hanging somewhere else.... Now I suppose I should try to debug init, and take a look at that nfs-cleanup script it might be causing a problem (obviously). Is that what you recommended Jon? On Thu, Aug 18, 2011 at 8:51 PM, Eray Ozkural <exa...@gm...> wrote: > I can now mount NFSv4, but I get the following error after switching root: > > init: ureadahead-other process terminated with status 4 > > What on earth is that? :) > > > -- > Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara > http://groups.yahoo.com/group/ai-philosophy > http://myspace.com/arizanesil http://myspace.com/malfunct > > -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct |
From: Eray O. <exa...@gm...> - 2011-08-18 17:51:36
|
I can now mount NFSv4, but I get the following error after switching root: init: ureadahead-other process terminated with status 4 What on earth is that? :) -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct |
From: Eray O. <exa...@gm...> - 2011-08-18 17:39:44
|
On Thu, Aug 18, 2011 at 8:22 PM, Jon Ander Hernandez <jo...@gm...>wrote: > Well this way we can ensure rpc.idmapd is running before init is run. > And the good thing is that is a simple fix. > > Now I will try with a clean Ubuntu 11.04 and see if I can reproduce > your "nobody" issue. > That's not going to be hard because I used an almost clean Ubuntu 11.04. BTW, I almost solved the idmapd launch issue by copying all nss libraries into dracut image. The copy loop they have is probably only working on Fedora (in the network module). They didn't care about making it distro agnostic so much. So then I got stuck with the groups file being empty, I'll just copy over passwd and group to see what happens. But then if I get stuck help me :) Anyway, when I solve these, I ought to be able to patch everything up so that it works flawlessly with Ubuntu 11.04 for everyone :) Cheers, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct |
From: Jon A. H. <jo...@gm...> - 2011-08-18 17:22:21
|
2011/8/18 Eray Ozkural <exa...@gm...>: > On Thu, Aug 18, 2011 at 7:34 PM, Jon Ander Hernandez <jo...@gm...> > wrote: >> >> Hello, >> >> 2011/8/18 Eray Ozkural <exa...@gm...>: >> > Hi there, >> > The core of all the NFSv4 mount problems is that rpc.idmapd does not >> > work. >> > You can check that by giving "rdbreak=pre-pivot" and drop into dracut >> > debug >> > shell. >> > When you try to start rpc.idmapd manually it gives the error: >> > rpc.idmapd: cannot find user "nobody" >> > and dies. This debug message isn't very informative in itself. >> > I've installed strace and other debug tools in a new dracut-013 build to >> > see >> > what happens. It turns out that rpc.idmapd dies when it tries to query >> > the >> > user because it goes to nsswitch and nss lib can't be loaded. I can't >> > paste >> > the strace because there isn't proper logging in dracut (we need network >> > logging). >> >> Interesting... Something that also worries me is that I am not able to >> manually start rpc.idmapd on Ubuntu. It only starts correctly if >> started by upstart. Are you able to start it manually on the frontend? >> > > I usually started it with upstart (start ipmapd). But now that I think of > it, yes, you can also start rpc.idmapd manually, try giving it the -vvv > options. The -f option doesn't work by the way, it just goes into > background. > These daemons all have terrible messages and logging that's something to fix > if I ever write an OS from scratch :) Of course I tried that before : root@jonan:~# pidof rpc.idmapd root@jonan:~# so is not running^C root@jonan:~# rpc.idmapd -vvv || echo "fails to start" rpc.idmapd: libnfsidmap: using domain: localdomain rpc.idmapd: libnfsidmap: loaded plugin /usr/lib/libnfsidmap/nsswitch.so for method nsswitch fails to start But in the end I discover the issue, when upstart stops idmapd it umounts rpc_pipefs /var/lib/nfs/rpc_pipefs/, but I wasn't aware of it because idmapd's upstart service doesn't umount it. >> >> But if a replace the init program with /bin/bash, I am able to start >> rpc.idmapd which is really weird... >> > > Why would that be? Well... I'm really stupid... hehe. Since you weren't able to start rpc.idmapd on Dracut I thought that it was neither running on my system. I didn't check with a pidof if it was already running. But in fact it works on my system. The problem is that /lib/dracut/hooks/pre-pivot/99nfs-clean.sh kills rpc.idmapd before switch_rooting into /sysroot. >> I think that we can avoid any issue of idmapd starting it before running >> init. >> We create a /sbin/init.kestrel script, and add >> option=/sbin/init.kestrel to the cmdline. >> >> /sbin/init.kestrel: >> #!/bin/bash >> rpc.idmapd || echo "idmapd failed to start :-S" >> exec /sbin/init --verbose > > Hmm, can you explain why that would work? Well this way we can ensure rpc.idmapd is running before init is run. And the good thing is that is a simple fix. Now I will try with a clean Ubuntu 11.04 and see if I can reproduce your "nobody" issue. |
From: Jon A. H. <jo...@gm...> - 2011-08-18 16:58:36
|
2011/8/18 Eray Ozkural <exa...@gm...>: > Ah, ok, so if we ran dracut on Fedora it would be fine because that's what > Fedora 15 uses anyway. How did you see that, did you break with > rdbreak=pre-pivot? Yes. :-) > Weird :/ > Cheers, > > On Thu, Aug 18, 2011 at 7:37 PM, Jon Ander Hernandez <jo...@gm...> > wrote: >> >> By the way, on Fedora 15 rpc.idmapd works fine (I mean on the >> initramfs stage), so the libnss/nobody/* issues are something specific >> to Ubuntu. > > > > -- > Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara > http://groups.yahoo.com/group/ai-philosophy > http://myspace.com/arizanesil http://myspace.com/malfunct > > |
From: Eray O. <exa...@gm...> - 2011-08-18 16:49:31
|
Ah, ok, so if we ran dracut on Fedora it would be fine because that's what Fedora 15 uses anyway. How did you see that, did you break with rdbreak=pre-pivot? Weird :/ Cheers, On Thu, Aug 18, 2011 at 7:37 PM, Jon Ander Hernandez <jo...@gm...>wrote: > By the way, on Fedora 15 rpc.idmapd works fine (I mean on the > initramfs stage), so the libnss/nobody/* issues are something specific > to Ubuntu. > -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct |
From: Eray O. <exa...@gm...> - 2011-08-18 16:47:39
|
On Thu, Aug 18, 2011 at 7:34 PM, Jon Ander Hernandez <jo...@gm...>wrote: > Hello, > > 2011/8/18 Eray Ozkural <exa...@gm...>: > > Hi there, > > The core of all the NFSv4 mount problems is that rpc.idmapd does not > work. > > You can check that by giving "rdbreak=pre-pivot" and drop into dracut > debug > > shell. > > When you try to start rpc.idmapd manually it gives the error: > > rpc.idmapd: cannot find user "nobody" > > and dies. This debug message isn't very informative in itself. > > I've installed strace and other debug tools in a new dracut-013 build to > see > > what happens. It turns out that rpc.idmapd dies when it tries to query > the > > user because it goes to nsswitch and nss lib can't be loaded. I can't > paste > > the strace because there isn't proper logging in dracut (we need network > > logging). > > Interesting... Something that also worries me is that I am not able to > manually start rpc.idmapd on Ubuntu. It only starts correctly if > started by upstart. Are you able to start it manually on the frontend? > > I usually started it with upstart (start ipmapd). But now that I think of it, yes, you can also start rpc.idmapd manually, try giving it the -vvv options. The -f option doesn't work by the way, it just goes into background. These daemons all have terrible messages and logging that's something to fix if I ever write an OS from scratch :) > But if a replace the init program with /bin/bash, I am able to start > rpc.idmapd which is really weird... > > Why would that be? > > I've found a very relevant bug report on dracut wrt this issue. It's > amusing > > that the bug used to happen in an old version of dracut and now it pops > up > > again. I guess nobody really uses the net boot, what a shame :) > > https://bugzilla.redhat.com/show_bug.cgi?id=537969 > > That is an specific bug of Fedora. Probably Fedora restarts rpc.idmapd > just after init starts and before starting anything else, and avoiding > the lack of rpc.idmapd. Also note that the readonly root and the non > writable /var/run (for registering the pid) is fixed on Fedora with > the new /run standard which is mounted with a tmpfs. > > AFAICT it is not specific to Fedora, it's a dracut bug. That bug happens in dracut rdinit, not Fedora init. Those fixes don't solve the issue. They thought they fixed it but I think they couldn't. I don't think running dracut on Fedora to create a ramdisk would change anything either if that's what you mean (?). It's the exact error that I'm seeing, and the cause is the same too: nss libs aren't present. I'll now install nss libs in dracut to try to fix it. > > There it says that adding the debug module to dracut solved his problem, > but > > it hadn't fixed mine, the newer debug module lacks the nss lib I think. > I'm > > going to try adding them to network module and see. > > I think that we can avoid any issue of idmapd starting it before running > init. > We create a /sbin/init.kestrel script, and add > option=/sbin/init.kestrel to the cmdline. > > /sbin/init.kestrel: > #!/bin/bash > rpc.idmapd || echo "idmapd failed to start :-S" > exec /sbin/init --verbose > Hmm, can you explain why that would work? > > BTW, I also tried using initramfs-tools in ubuntu, but that wasn't > helpful > > either, that image only supports nfsv3 (I think) and nfsv3 mount dies > with > > the "incorrect mount option" error on unfortunately (I haven't figured > out > > what that error is caused by though)..... > > For debugging purposes, I strongly recommend that you build a dracut with > > the debug and busybox modules, it's quite helpful that way. I also added > a > > module of my own with other debug tools. It would be neat to provide this > > option in kestrelhpc by default. > > Yeah, that sound interesting. By the way the Dracut 0.10 used on > KestrelHPC was packaged by me, I was planning to update it to 0.13 and > upload it to Debian and Ubuntu, but it requires still somework like > testing/fixing the issues arised from the tests. > All right, keep that in mind, I can add that later :) After I get this to work first.... Cheers, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct |
From: Jon A. H. <jo...@gm...> - 2011-08-18 16:37:48
|
By the way, on Fedora 15 rpc.idmapd works fine (I mean on the initramfs stage), so the libnss/nobody/* issues are something specific to Ubuntu. |