Thread: Re: [SSI-devel] R1.9.3 Infiniband ICS patches
Brought to you by:
brucewalker,
rogertsang
From: Smith, S. <sta...@in...> - 2006-06-23 17:32:20
|
Aneesh Kumar wrote: > On 6/21/06, Smith, Stan <sta...@in...> wrote: >> Hi John, >> Could you please review, or delegate a review, of these minor >> patches to support ICS over Infiniband. These patches build an ICS >> device infrastructure in support of 3 ICS device flavors: >> Ethernet (default). >> Infiniband IPoIB similar to ICS Ethernet in that they both use IPv4 >> transport protocols. Infiniband reliable connections with RDMA read >> support.=20 >>=20 >> This patch does not contain IB stack code, only the infrastructure to >> support ICS device selection; see Cluster Kconfig selection. >>=20 >=20 > I have looked at the changes. I am attaching the comments below. >=20 > +ifeq ($(CONFIG_ICS_ETHERNET),y) > obj-$(CONFIG_ICS) :=3D ics_tcp/ > +endif >=20 > This can be written as >=20 > obj-$(CONFIG_ICS_ETHERNET) :=3D ics_tcp/ >=20 >=20 > True with rest of Makefile changes Will do. >=20 > typedef union { > +#if defined(CONFIG_ICS_IP) > tcp_cli_llhandle_t tcpcli_llhandle; > +#elif defined(CONFIG_ICS_IB_VERBS) > + ib_cli_llhandle_t ibcli_llhandle; > +#else > +#error "No define for cli_ll_handle" > +#endif > + > } cli_llhandle_t; >=20 I suspect the union is historical due to previous ICS devices which are no longer supported?=20 My goal was to make minimal src tree impact. Removing the union sounds good although there will be work cleaning up #define's in tcp_sock.h. Please advise when you push these changes as I will need to do similar cleanup in ics/ics_ib_private.h as it was based on tcp_sock.h. >=20 > I guess the purpose of making it a union is to avoid these type of > typedefs. But then since we conditionally build the infiniband and IP > transport that would be possible. >=20 > Something i noticed in the file system with respect to inode is the > generic inode is a part of each file system inode and each file system > have macros to get the file system specific macro from generic one > Take a look at PROC_I. I guess for the cvs code we can push the > #define as you kept here. I will clean it up in my git repository > later. >=20 >=20 > -aneesh Thanks for the feedback. Stan. |
From: Smith, S. <sta...@in...> - 2006-09-27 15:47:14
|
Roger Tsang wrote: > Hi all, >=20 > Just wondering on the status of ICS + IB integration and if it is > ready to be released along with SSI-1.9.3 for FC3. >=20 > Roger > Summary - close but not entirely there. Speaking from an Fedora Core 3 standpoint:=20 The mechanical side of ICS-over-IB (Infiniband) is operational and documented; IPoIB Or IB verbs based ICS is stable. I would not claim ics-ib to be 'optimal' as of yet. BTW, usermode IB verbs access has been recently enabled. >From an openSSI integration standpoint, ICS-over-IB (IPoIB / Verbs) is not yet integrated properly. 1) ssi-addnode, clustertab/dhcpd.conf/boottab generation require modifications t handle the 20 byte IB hardware address correctly. 2) PXE/Etherboot boot over IB is now available, although not standard for IB hardware. Via rom-o-matic.net one can re-flash IB HCA hardware to contain firmware which will PXE or Etherboot over IB; requires latest DHCP server + magic in dhcpd.conf setup. Until openSSI tools support IB hardware address length, IB nodes require an Ethernet connection to PXE boot from, which in turn requires a handcrafted initrd /etc/boottab, different than what ssi-addnode generates. A bit tedious although a tractable problem. 3) FC3 ifconfig truncates the IB address to 16 bytes; later FC5 version does display the full IB hardware address. Using a newer ifconfig would simplify the modifications to mkinitrd in that the current mkinitrd hardware address parsing could be used instead of 'ip addr'. Stan. |
From: Smith, S. <sta...@in...> - 2006-09-28 17:00:46
|
Roger Tsang wrote: > I've just upgraded a OpenSSI-1.9.2 FC3 system to FC5 release of > net-tools-1.60-62.1 from source. It seems to work fine in FC3. Very good news, I will do the same upgrade and simplify the mkinitrd mods. > With item #3 addressed the rest will fall in place. Not exactly. The ICS/IB support should remain as patches until the openSSI kernel base moves forward to a kernel which supports IB. At such a time, one of the 3 IB patches can disappear, while the other two can become integrated into the openSSI code base. The ICS/IB support is embodied in 3 separate patches which applied correctly to the R1.9.2(ssi_3) release. I'll let Aneesh chime in on how his cluster framework fits into a OSSI R1.9.3 release and how the IB patches could work. =20 Assuming an R1.9.2(ssi_3) ICS base, the 3 patches embody the following: 1) ICS device config for Ethernet or Infiniband. 2) openIB.org Infiniband stack (sv.8058) backport to OSSI kernel 2.6.10-ssi_3develsmp. 3) ICS/IB IB verbs support (ics/ics_ib/*). All 3 patches are required for ICS/IB(IPoIB or Verbs). Patches 1 & 3 could become part of the R1.9.3 src base although they are not useful until patch #2 is applied. Why 'not exactly' aka, the Mechanics of booting when using InfiniBand as the ICS network. In today's movie, Infiniband HCA PCI/HTX cards do not contain PXE/Etherboot ROM code to enable booting over InfiniBand. The ROM code is available although I suspect it may invalidate the device warranty as it requires re-flashing the device ROM, not to mention the leap of faith in re-flashing a device. Flashing new ROM code is doable and has been proven to work, but forcing users down this path is questionable at this time. Other downsides are the yet to be started openSSI dhcpd.conf generation modifications required to support IB booting and the fact that major clustertab reconfig is required to get back to an Ethernet ICS. One could just save the Ethernet ICS clustertab & /etc/hosts files prior to configuring for IB. I prefer evolution opposed to revolution. Determining the IB hardware address requires a booted system with IPoIB drivers loaded; (ifconfig ib0 | ip link show dev ib0). The path-of-least-resistance to ICS/IB is to install & configure openSSI normally (Ethernet based ICS), install openSSI kernel src RPM, rebuild kernel with IB device support; ICS remains Ethernet. Reboot the cluster, load IPoIB drivers at each node, determine IB hardware address via 'ip link show dev ib0', hand craft an /etc/boottab.ib for each node, reconfig and build a kernel (ICS/IB), generate a new initrd (mkinitrd --cfs --ics=3Dib, default is ics=3Deth), install and reboot as ICS/IB cluster. Point being, you need a running IB capable system to determine IB hardware addresses. Based on the assumption that almost all node platforms have an Ethernet connection, why not leverage known, proven openSSI Ethernet booting techniques to boot an ICS/IB cluster? The upside is you get an openSSI cluster which installs as Ethernet based ICS which can be extended (new kernel + boottab.ib) to be an Infiniband ICS without losing the ability to switch back to Ethernet ICS (reboot original openSSI kernel). Depending on which kernel you boot, you get Ethernet or IB as the ICS; 'clustertab' remains Ethernet based. Downside: '/etc/boottab.ib' is hand crafted, ssi-*node tools apply only to Ethernet ICS env as clustertab is ICS/Ethernet. =20 I agree the IB boot story and ICS/IB system integration story is ugly although the alternative is equally distasteful and more permanent. Once IB HCAs contain PXE/Etherboot code, then going down the more permanent path make more sense; not to mention time to fully validate ssi-*node tools & dhcpd.conf generation. A slightly different take on IB integration... Although the business cluster sweet-sport w.r.t. node count is <=3D 32 nodes, assuming homogenous network hardware throughout a cluster's lifetime could be a limiting growth/adoption factor. We all should be thinking about how to support heterogeneous ICS devices as into the near future, dynamic reconfiguration of clusters will be very important for HA and capacity planning in business and/or scientific clusters. How many of you remember 'Non-stop computing'? Stan. >=20 > Roger >=20 >=20 > On 9/27/06, Smith, Stan <sta...@in...> wrote: >> Roger Tsang wrote: >>> Hi all, >>>=20 >>> Just wondering on the status of ICS + IB integration and if it is >>> ready to be released along with SSI-1.9.3 for FC3. >>>=20 >>> Roger >>>=20 >>=20 >> Summary - close but not entirely there. >>=20 >> Speaking from an Fedora Core 3 standpoint: >>=20 >> The mechanical side of ICS-over-IB (Infiniband) is operational and >> documented; IPoIB Or IB verbs based ICS is stable. I would not claim >> ics-ib to be 'optimal' as of yet. BTW, usermode IB verbs access has >> been recently enabled.=20 >>=20 >> From an openSSI integration standpoint, ICS-over-IB (IPoIB / Verbs) >> is not yet integrated properly. >>=20 >> 1) ssi-addnode, clustertab/dhcpd.conf/boottab generation require >> modifications t handle the 20 byte IB hardware address correctly. >>=20 >> 2) PXE/Etherboot boot over IB is now available, although not >> standard for IB hardware. Via rom-o-matic.net one can re-flash >> IB HCA hardware to contain firmware which will PXE or Etherboot >> over IB; requires latest DHCP server + magic in dhcpd.conf >> setup. Until openSSI tools support IB hardware address length, >> IB nodes require an Ethernet connection to PXE boot from, which >> in turn requires a handcrafted initrd /etc/boottab, different >> than what ssi-addnode generates. A bit tedious although a tractable >> problem. =20 >>=20 >> 3) FC3 ifconfig truncates the IB address to 16 bytes; later FC5 >> version does display the full IB hardware address. Using a newer >> ifconfig would simplify the modifications to mkinitrd in that >> the current mkinitrd hardware address parsing could be used >> instead of 'ip addr'.=20 >>=20 >> Stan. |
From: chris b. <cb...@si...> - 2006-10-13 20:23:38
|
On Thu, 2006-09-28 at 10:00 -0700, Smith, Stan wrote: > Once IB HCAs contain PXE/Etherboot code, then going down the more > permanent path make more sense; not to mention time to fully validate > ssi-*node tools & dhcpd.conf generation. Not to sound like a sales guy, but our HCAs have done this (PXEboot) for over two years now. You can snag the 20 byte string on initial pxelinux.bin boot from the leases file, and re-write it into the uid in the dhcpd.conf host stanza on the fly, so by the time the module loads, you're all set. It's a kludge, but what else can you do? The big issue here is initially IP-ing a large cluster - ssi or not - using PXE, especially if you want a specific address per node, and most people do. You must fire nodes up in order one at a time - OR - you need a uid->IP map ahead of time. It's ugly. This definitely needs a lot of thought put to it. -- Regards, -C |
From: Aneesh K. <ane...@gm...> - 2006-06-23 17:46:12
|
On 6/23/06, Smith, Stan <sta...@in...> wrote: > > > > > typedef union { > > +#if defined(CONFIG_ICS_IP) > > tcp_cli_llhandle_t tcpcli_llhandle; > > +#elif defined(CONFIG_ICS_IB_VERBS) > > + ib_cli_llhandle_t ibcli_llhandle; > > +#else > > +#error "No define for cli_ll_handle" > > +#endif > > + > > } cli_llhandle_t; > > > I suspect the union is historical due to previous ICS devices which are > no longer supported? > My goal was to make minimal src tree impact. > Removing the union sounds good although there will be work cleaning up > #define's > in tcp_sock.h. Please advise when you push these changes as I will need > to > do similar cleanup in ics/ics_ib_private.h as it was based on > tcp_sock.h. > > I have pushed some changes in my ics cleanup git repository. I guess layering the data structure that way will help in layering much better. Let me know what you think http://git.openssi.org/~kvaneesh/gitweb.cgi?p=ci-to-linus.git;a=commitdiff;h=8af09cfbe133893503b0d42aef13fd5e232e5d9d I have tested this code base using the cluster-tools-ng code base. -aneesh |
From: Roger T. <rog...@gm...> - 2006-09-27 01:51:44
|
Hi all, Just wondering on the status of ICS + IB integration and if it is ready to be released along with SSI-1.9.3 for FC3. Roger On 6/23/06, Aneesh Kumar <ane...@gm...> wrote: > On 6/23/06, Smith, Stan <sta...@in...> wrote: > > > > > > > > typedef union { > > > +#if defined(CONFIG_ICS_IP) > > > tcp_cli_llhandle_t tcpcli_llhandle; > > > +#elif defined(CONFIG_ICS_IB_VERBS) > > > + ib_cli_llhandle_t ibcli_llhandle; > > > +#else > > > +#error "No define for cli_ll_handle" > > > +#endif > > > + > > > } cli_llhandle_t; > > > > > I suspect the union is historical due to previous ICS devices which are > > no longer supported? > > My goal was to make minimal src tree impact. > > Removing the union sounds good although there will be work cleaning up > > #define's > > in tcp_sock.h. Please advise when you push these changes as I will need > > to > > do similar cleanup in ics/ics_ib_private.h as it was based on > > tcp_sock.h. > > > > > > I have pushed some changes in my ics cleanup git repository. I guess > layering the data structure that way will help in layering much > better. Let me know what you think > > http://git.openssi.org/~kvaneesh/gitweb.cgi?p=ci-to-linus.git;a=commitdiff;h=8af09cfbe133893503b0d42aef13fd5e232e5d9d > > I have tested this code base using the cluster-tools-ng code base. > > -aneesh > > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > |