#196 ssi-addnode-dynamic causes new nodes to kernel panic

v1.9.3
open
nobody
5
2010-07-24
2010-07-24
djdoomsday
No

Running Debian v5 2.6.14.
Whenever we attempt to use the ssi-addnode-dynamic all nodes booting will kernel panic when init calls the DHCP client (udhcpc). The cluster works perfectly when using manual ssi-addnode. This has been tested in VMWARE (e1000 emulation) and several dissimilar physical machines.

Further debug info forth-coming - Any feedback on what you'd require?

Discussion

1 2 > >> (Page 1 of 2)
  • djdoomsday
    djdoomsday
    2010-07-27

    We've tried a many multitude of things such as another DHCP client, a newly compiled version of udhcpc, the version of udhcpc within the latest busybox, etc to no avail. Has anyone had any success with addnode-dynamic?

    On an unrelated note - Is this the right place to post? It has been difficult to find where this community hands out.

     
  • John Hughes
    John Hughes
    2010-07-27

    "Whenever we attempt to use the ssi-addnode-dynamic all nodes booting will kernel panic"

    What panic?

    "On an unrelated note - Is this the right place to post?"

    This is the right place to report bugs. If you have questions about using OpenSSI check the openssi-users mailing list. for development check the developers mailing list.

    For general info check the website http://www.openssi.org

     
  • djdoomsday
    djdoomsday
    2010-07-27

    Unfortunately I'm unable to get the exact error as I don't know how to dump the panic to any sort of log. I have patched together as many frames of a captured video session of it here: http://i32.tinypic.com/2rw5h5y.jpg

     
  • John Hughes
    John Hughes
    2010-07-27

    It's crashing maybe at at a call to kmem_cache_alloc

    The call trace is
    sys_select -> do_select -> ssidev_poll_cli_alloc -> BANG!

    Looks dodgy.

     
  • John Hughes
    John Hughes
    2010-07-27

    (by looks dodgy I mean it looks like we're trying to do some cluster stuff before the cluster is initialised).

     
  • mmm, if that is the case then the init script is broken - We did a lot of tracing and it happens right at when udhcpc is called. You should be able to easily replicate the situation by making sure the node you're adding has no static IP entry and you have udhcpc installed and added to /etc/mkinitrd/exe. Those are the conditions you require to have the init script call udhcpc.
    You can also see from the log that udhcpc does successfully start, but must explode somewhere within itself and cause init to die. I'm wondering if there isn't some dependency that we're missing for udhcpc?
    I also noticed that busybox is installed, but not used? Should we be using a busybox environment instead?

     
  • Roger Tsang
    Roger Tsang
    2010-07-28

    initialize ssidev_poll_* structures early

     
  • Roger Tsang
    Roger Tsang
    2010-07-28

    Try attached patch for linux-2.6.11-ssi.

     
  • djdoomsday
    djdoomsday
    2010-07-28

    mmm, I'm currently running 2.6.14. Are you suggesting I will need to compile ssi to 2.6.11 with this patch?

     
  • John Hughes
    John Hughes
    2010-07-28

    You'll need to port Rogers patch from 2.6.11 to 2.6.14, or wait for me to do it (I'll be able to get around to this this weekend - sorry it will take so much time but the disk on my laptop blew up and I have to rebuild my build environment (lenny chroot)).

     
1 2 > >> (Page 1 of 2)