Thread: [SSI] re: Compaq launches Open SSI Cluster Projects
Brought to you by:
brucewalker,
rogertsang
From: Greg F. <freemyer-ml@NorcrossGroup.com> - 2001-07-03 04:12:50
|
Bruce, I have just read your paper at http://bjbrew.org/cpq/ssic_linux/montreal/sld001.htm and in particular the summary page at http://bjbrew.org/cpq/ssic_linux/montreal/sld053.htm I must say that I am blown away with what you are doing. =20 Once your goals are accomplished, it looks to me as if you will have the most = advanced UNIX level HA/HPC clustering solution available.(I am including the = commercial products like TruClusters and Veritas. I don't know enough about = Beowulf or Mosix to comment.) Would you agree with that? Without fully understanding the pros/cons, I hope you are successful in = garnering interest for getting this into the standard Linux Kernel. = (Unfortunately, I am not a player in the Linux world so my support won't mean = anything.) The one negative I see is that you seem to have been developing this in a = vacuum from the Linux communities perspective. The only Linux technology I see in the presentation is GFS. Are there any other pre-existing Linux HA/HP cluster technologies you are = incorporating? Greg =3D=3D=3D=3D=3D=3D=3D Greg Freemyer Internet Engineer Deployment and Integration Specialist The Norcross Group www.NorcrossGroup.com >> Compaq has launched two open source technology projects >> under the GPL license. They are briefly described below=20 >> and can be found through www.opensource.compaq.com. >> We are actively looking for technology partners,=20 >> contributors, consultants and general kibitzers to >> participate via the email lists set up for each project. >> Those that just want to monitor the projects are welcome >> as well. >> Cluster Infrastructure for Linux (CI) >> The goal of this project is to develop a common=20 >> infrastructure for many if not all forms of Linux=20 >> clustering by extending the Cluster Membership and=20 >> Inter-node Communication Subsystems from Compaq's=20 >> NonStop Clusters for Unixware code base. This project=20 >> also provides the basis for the Open SSI Clusters for=20 >> Linux project. =20 >> A developers download is available via >> www.opensource.compaq.com for Intel-32, along=20 >> with build, boot, hook, interface and api documentation. >> We will put the CVS repository on the web when we can. >> A port to the alpha chip has already succeeded and=20 >> patches for that are available. >> Open Single System Image (SSI) Clusters for Linux Project >> The Open SSI project leverages both Compaq's NonStop >> Clusters for Unixware technology and other open source >> technology to provide a full, highly available SSI >> environment for Linux. Goals for SSI Clusters include >> availability, scalability and manageability, built from >> standard servers. Technology pieces will include: >> membership, single root and single init, cluster filesystems >> and DLM, single process space and process migration, load >> leveling, availability monitors and failover, single namespace =20 >> and shared access for all forms of IPC, devices and networking,=20 >> and a single management space. The SSI project will leverage=20 >> the Cluster Infrastructure for Linux project. >> Source beyond the CI base is not yet available. We are >> aiming for a developers release of much of functionality in >> July. In the meantime there is a presentation on SSI >> Clustering on the web. An initial list of component requirements=20 >> will soon be posted for discussion and refinement. >> Join the mail alias via www.opensource.compaq.com >> to stay updated. >> bruce walker >> SSI Cluster Architect >> Linux Program Office >> Compaq Computers >> Linux-cluster: generic cluster infrastructure for Linux >> Archive: http://mail.nl.linux.org/linux-cluster/ |
From: Alan R. <al...@un...> - 2001-07-03 06:01:13
|
Greg Freemyer wrote: > you seem to have been developing this in a vacuum from the Linux communities > perspective. > > The only Linux technology I see in the presentation is GFS. > > Are there any other pre-existing Linux HA/HP cluster technologies you are > incorporating? Hi Greg, We've certainly encouraged them to work together with us to make their software fit into the planned community clustering infrastructure project. This would be of benefit to them, to the Linux community, and to potential users of clustering infrastructure. We're still waiting to hear if they're interested. -- Alan Robertson al...@un... |
From: David B. <Dav...@or...> - 2001-07-03 06:16:00
|
They -have- developed in a linux vacuum; this is the Tandem/NonStop/SCO Unix cluster stuff, pretty much as deployed on that platform. Personally, I am more interested in the CI part of the project, which along with the IBM DLM would provide a reasonable GFS platform. It seems it would get together faster than GFS-on-DLM-on-heartbeat would. I'm less into in the very-large scope SSI work, maybe because I don't understand the implications of the failure domains. It has appeared to me before that failure of a node is likely to have deeper ripples in the SSI scheme than it does in clusters with less tightly coupled nodes. And it is all very intrusive in ways one wonders if Linus would ever accept. Its a noble attempt, but it's gonna be a lot harder to accept. In a perfect world, many of the components would plug together; I don't know how the CI stuff maps into the heartbeat model. -dB Alan Robertson wrote: > > Greg Freemyer wrote: > > > you seem to have been developing this in a vacuum from the Linux communities > > perspective. > > > > The only Linux technology I see in the presentation is GFS. > > > > Are there any other pre-existing Linux HA/HP cluster technologies you are > > incorporating? > > Hi Greg, > > We've certainly encouraged them to work together with us to make their > software fit into the planned community clustering infrastructure project. > This would be of benefit to them, to the Linux community, and to potential > users of clustering infrastructure. > > We're still waiting to hear if they're interested. |
From: Alan R. <al...@un...> - 2001-07-03 07:00:13
|
David Brower wrote: > > They -have- developed in a linux vacuum; this is the Tandem/NonStop/SCO > Unix cluster stuff, pretty much as deployed on that platform. > > Personally, I am more interested in the CI part of the project, which > along with the IBM DLM would provide a reasonable GFS platform. It seems > it would get together faster than GFS-on-DLM-on-heartbeat would. Got any Round Tuits you can send me? Any grad students? > In a perfect world, many of the components would plug together; I don't know > how the CI stuff maps into the heartbeat model. Hopefully, heartbeat will map into the framework model, not the other way around ;-) It is one of the influences that enters into the framework model, and there are a few things it does nicely, and we'll preserve those. But, it will just be one component of many. -- Alan Robertson al...@un... |
From: Lyle B. <lbi...@bi...> - 2001-07-03 07:25:12
|
The good news is that this code contains very important intellectual property - and I am amazed that Compaq has released it under GPL2. (I've been working with Tandem/Compaq for several years on highly scalable, highly reliable SSI and CI architectures.) I haven't looked at all the code (some is yet to be released...), but if it is the code I'm familiar with, it allows for process migration, process pairs, etc. for "serious" HA for business applications and databases. David Brower wrote: > > They -have- developed in a linux vacuum; this is the Tandem/NonStop/SCO > Unix cluster stuff, pretty much as deployed on that platform. Yup. > > Personally, I am more interested in the CI part of the project, which > along with the IBM DLM would provide a reasonable GFS platform. It seems > it would get together faster than GFS-on-DLM-on-heartbeat would. > > I'm less into in the very-large scope SSI work, maybe because I don't > understand the implications of the failure domains. It has appeared to me > before that failure of a node is likely to have deeper ripples in the SSI > scheme than it does in clusters with less tightly coupled nodes. Not so. With process pairs and process migration, one can have apps. that are almost impervious to hardware failure. (Take a look at Jim Gray's book - Transaction Processing Concepts and Techniques, section 3.7 - Fault Model and Software Fault Masking). > And it > is all very intrusive in ways one wonders if Linus would ever accept. Its > a noble attempt, but it's gonna be a lot harder to accept. By definition, SSI clusters ARE intrusive - at least from the kernel standpoint. It's a question of "degree". If "hooks" are used, then it would seem that those "hooks" may well server beneficial to all CI work(?). At the application level, intrusiveness can be somewhat "hidden". For instance, if a JVM is made to be "fault tolerant", applications running on that JVM "inherit" SOME of the qualities of that fault tolerance. For an application (say database) to be fully fault tolerant (hardware and software), the that app must be architected for fault tolerance (process pairs, etc.). > In a perfect world, many of the components would plug together; I don't know > how the CI stuff maps into the heartbeat model. Again, I'll have to look at the code - but "typically" all clustering dealing with HA fault domains must have "heartbeat". As I recollect from some of my earlier IP work, HP and Tandem hold the original patents on "heartbeat". ---- snip ---- snip ---- Cheers, Lyle > ------------------------------------------------------------------------------ > Linux HA Web Site: > http://linux-ha.org/ > Linux HA HOWTO: > http://metalab.unc.edu/pub/Linux/ALPHA/linux-ha/High-Availability-HOWTO.html > ------------------------------------------------------------------------------ -- Lyle Bickley | Bickley Consulting West Inc. lbi...@ac... | lbi...@bi... | V 650-428-0621 http://bickleywest.com/ | F 650-428-0599 "Black holes exist where GOD is dividing by zero" |
From: Peter B. <tab...@ya...> - 2001-07-03 11:54:59
|
--- Lyle Bickley <lbi...@bi...> wrote: <snip> > > Again, I'll have to look at the code - but "typically" all clustering > dealing with HA fault domains must have "heartbeat". As I recollect > from some of my earlier IP work, HP and Tandem hold the original patents > on "heartbeat". > Lyle, Could you expand on this point about HP and Tandem holding original patents on 'heartbeat'? Do you know the patent numbers, or are there more specifics you could provide? There are many implementations of "heartbeat", including one very notable Open Source one ;) and I'd be quite curious to understand what these patents may cover. > ---- snip ---- snip ---- > > Cheers, > Lyle > > -- > Lyle Bickley | Bickley Consulting West Inc. > lbi...@ac... | > lbi...@bi... | V 650-428-0621 > http://bickleywest.com/ | F 650-428-0599 > > "Black holes exist where GOD is dividing by zero" > Peter ===== These have been the opinions of: Peter R. Badovinatz -- (503)578-5530 (TL 775) wo...@us.../tab...@ya... and in no way should be construed as official opinion of IBM, Corp. __________________________________________________ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail http://personal.mail.yahoo.com/ |
From: Bruce W. <br...@ka...> - 2001-07-03 08:02:51
|
David, > They -have- developed in a linux vacuum; this is the Tandem/NonStop/SCO > Unix cluster stuff, pretty much as deployed on that platform. As with any significant contribution to Linux, we started with something (didn't GFS, DLM, failsafe and many others start that way?). We are open sourcing the complete NonStop Cluster technology. Our goal is allow the community to leverage that technology, along with other open source technology (GFS, LVS, DLM, failsafe, etc.) to build the best clustering product around. > > Personally, I am more interested in the CI part of the project, which > along with the IBM DLM would provide a reasonable GFS platform. It seems > it would get together faster than GFS-on-DLM-on-heartbeat would. We broke apart the CI components specifically for this need. Because IBM only released a small subset of their clustering, the DLM did not have a sufficiently rich membership service to layer on. We felt we had something they (and thus the community) could use. > > I'm less into in the very-large scope SSI work, maybe because I don't > understand the implications of the failure domains. It has appeared to me > before that failure of a node is likely to have deeper ripples in the SSI > scheme than it does in clusters with less tightly coupled nodes. And it > is all very intrusive in ways one wonders if Linus would ever accept. Its > a noble attempt, but it's gonna be a lot harder to accept. > > In a perfect world, many of the components would plug together; I don't know > how the CI stuff maps into the heartbeat model. > > -dB > > Alan Robertson wrote: > > > > Greg Freemyer wrote: > > > > > you seem to have been developing this in a vacuum from the Linux communities > > > perspective. > > > > > > The only Linux technology I see in the presentation is GFS. > > > > > > Are there any other pre-existing Linux HA/HP cluster technologies you are > > > incorporating? > > > > Hi Greg, > > > > We've certainly encouraged them to work together with us to make their > > software fit into the planned community clustering infrastructure project. > > This would be of benefit to them, to the Linux community, and to potential > > users of clustering infrastructure. > > > > We're still waiting to hear if they're interested. > > Linux-cluster: generic cluster infrastructure for Linux > Archive: http://mail.nl.linux.org/linux-cluster/ |
From: Bruce W. <br...@ka...> - 2001-07-03 07:35:05
|
Alan, We are very interested in working with any and all Linux cluster groups. I haven't sent a response to your earlier message about your framework because I haven't studied it enough to react intelligently. The desire to produce SSI cluster may add requirements to your framework. As for APIs, a set of proposed membership APIs is included in the Cluster Infrastructure project (along with all the code to build, boot and play with for clusters that may scale up to 64 nodes (haven't tested that big yet, though)). I would very much like to start a discussion on membership APIs (since in our experience, this is the first place an application might become cluster aware). I'll send the URL where you can review them. bru...@co... > > Hi Greg, > > We've certainly encouraged them to work together with us to make their > software fit into the planned community clustering infrastructure project. > This would be of benefit to them, to the Linux community, and to potential > users of clustering infrastructure. > > We're still waiting to hear if they're interested. > > -- Alan Robertson > al...@un... > > Linux-cluster: generic cluster infrastructure for Linux > Archive: http://mail.nl.linux.org/linux-cluster/ |
From: Alan R. <al...@un...> - 2001-07-03 12:57:56
|
Bruce Walker wrote: > > Alan, > We are very interested in working with any and all > Linux cluster groups. I haven't sent a response to > your earlier message about your framework because > I haven't studied it enough to react intelligently. I suppose you could have taken the alanr-approach, and commented on it without feeling constrained by a lack of knowledge ;-) Please keep in mind that the document is being written as we speak, and very much subject to change. I've made significant changes as a result of feedback from the linux-cluster list, and hope to continue to do so. Don't hesitate to ask questions about the things which will inevitably be unclear (or even wrong!) in it. There are not yet any details on any APIs in the document. There is an incomplete list of API sets. Having helped another large computer company open source some HA software in the past, I know how time-consuming and frustrating this initial phase of the project can be. I'm sure you've come to know and love your lawyers ;-) Speaking of this -- has COMPAQ committed to a license for the complete set of software you're going to provide? Of course, the GPL/LGPL would be the most harmonious choice, since every other OSS project uses them. If you're interested in some background on how I have approached HA designs in the past, or my personal leanings, you might find it helpful to read the heartbeat design document. You might also try it you have insomnia, and are don't like pills ;-) It's here: http://www.linuxshowcase.org/2000/2000papers/papers/robertson/ > The desire to produce SSI cluster may add > requirements to your framework. Of course. Producing an "X" cluster (for any "X") generally requires more capabilities, and more APIs. My big concern about APIs in this area, is that the base set of APIs not be encumbered by the desire to add a set of APIs for an optional feature for certain types of clusters (like SSI). The set of APIs is not fixed, nor is it ever intended to be completely fixed. The idea is that you should be able to assemble a cluster out of the components you need -- and leave out those you don't need. That will necessarily leave out certain APIs. The set of APIs needs to be well thought-out, well-designed and harmonious - but I don't see that it has to be bounded by any particular hard boundary. The set of libraries on a Linux system isn't intended to be bounded, nor is the set of plugins for the GIMP. > As for APIs, a > set of proposed membership APIs is included in the > Cluster Infrastructure project (along with all the code > to build, boot and play with for clusters that may > scale up to 64 nodes (haven't tested that big yet, > though)). Where can I find those proposed APIs? > I would very much like to start a discussion on > membership APIs (since in our experience, this is > the first place an application might become cluster > aware). My two favorite areas are membership and basic cluster messaging. By the way, keep in mind that in the framework, APIs are those things that are exposed to the user *or* other cluster components. So, basic messaging will be of interest to other cluster components soon. > I'll send the URL where you can review them. Great! On this note, I just added some general, semi-philosophical thoughts on APIs to the document. Looking forward to the URL... -- Alan Robertson al...@un... |
From: Alan R. <al...@un...> - 2001-07-03 13:04:32
|
Alan Robertson wrote: > > Speaking of this -- has COMPAQ committed to a license for the complete set > of software you're going to provide? > > Of course, the GPL/LGPL would be the most harmonious choice, since every > other OSS project uses them. OOPS! I meant "every other OSS HA project". Sorry... -- Alan Robertson al...@un... |
From: Bruce W. <br...@ka...> - 2001-07-03 07:13:37
|
Greg, Thanks for the interest in full SSI clustering. Full SSI clustering is not as familiar to most people as HA clustering or HPC clustering. As you noted, it is very ambious. Fortunately we have been at it for many years (working on different Unix bases). A key component is a single root. However, a terse list of some of the components we believe are needed for an SSI cluster shows that single root filesystem is just part of one of them (list provided below). The plan for the project is to start with a discussion of the component areas and of requirements for the component areas. For many of the component areas we have Linux code, which was reworked from the code we had on the Unixware base. Two of the components (membership and internode communication) are already open sourced via the CI project (Cluster Infrastructure, available via the www.opensource.compaq.com link). We plan to have an initial integrated developers release of many of the other components later this month. Areas we hope and expect to leverage existing Linux projects and technology include: a. filesystems (we will release a cluster filesystem we have developed but hope to involve and incorporate any that come around (starting with GFS). b. all aspects of application monitoring and restart (many different linux projects to work from here) c. load leveling (both connection load leveling like LVS and process migration load leveling like Mosix) d. devfs (we have enhancements to the basic devfs to provide a transparent clusterwide device view and clusterwide device access. e. DLM (our cluster filesystem didn't need one but many others do and we are working to fold the open sourced DLM into CI. The goals of SSI clustering simple - simultaneously provide high availability, scalability and manageability. If we are successful, SSI clusters will not only be the HA clusters of the future but may also be the load leveling and high performance clusters as well. Here is a very terse list of SSI component areas: 1. Membership - kernel boot time; APIs; coordinate kernel cleanup; split brain; STOMITH 2. Internode Communication Subsystem - kernel boot time; channels; flow control; transports, 3. Filesystem - single root; single mount tree; access to all filesystems; offset coherency 4. Processes - single namespace and full access to all from all; arbitrary node failure; /proc; 5. Devices - single namespace for all; access to all from anywhere; persistence; parallel access, ... 6. Interprocess Comm - single namespace; access all sysVipc, pipes, fifos, ptys, Unix sockets, Inet sockets 7. TCP/IP networking - single set of devices; single port space; cluster virtual IP (CVIP); connection load leveling IP failover or CVIP failover 8. Paging/Swap - single set of devices; borrow space if needed 9. kernel data replication service - maintain consistency; populate new nodes 10. Cluster Volume Manager 11. HA shared storage 12. HA interconnect 13. DLM; 14. SSI system mgmt; very small enhancements to single machine Linux tools 15. single HA init; cluster booting and run levels 16. HA applications and system daemons - simplified versions (due to SSI) of standard HA tools 17. timesync 18. load leveling 19. Packaging and Installation 20. object location interfaces and object movement interfaces - moving pipes and sockets etc. from node to node Soon there will be an annotated presentation on SSI, followed by an initial list of requirements, component by component. > Bruce, > > I have just read your paper at > > http://bjbrew.org/cpq/ssic_linux/montreal/sld001.htm > > and in particular the summary page at > > http://bjbrew.org/cpq/ssic_linux/montreal/sld053.htm > > I must say that I am blown away with what you are doing. > > Once your goals are accomplished, it looks to me as if you will have the most advanced UNIX level HA/HPC clustering solution available.(I am including the commercial products like TruClusters and Veritas. I don't know enough about Beowulf or Mosix to comment.) > > Would you agree with that? > > Without fully understanding the pros/cons, I hope you are successful in garnering interest for getting this into the standard Linux Kernel. (Unfortunately, I am not a player in the Linux world so my support won't mean anything.) > > The one negative I see is that you seem to have been developing this in a vacuum from the Linux communities perspective. > > The only Linux technology I see in the presentation is GFS. > > Are there any other pre-existing Linux HA/HP cluster technologies you are incorporating? > > Greg > ======= > > Greg Freemyer > Internet Engineer > Deployment and Integration Specialist > The Norcross Group > www.NorcrossGroup.com > > >> Compaq has launched two open source technology projects > >> under the GPL license. They are briefly described below > >> and can be found through www.opensource.compaq.com. > > >> We are actively looking for technology partners, > >> contributors, consultants and general kibitzers to > >> participate via the email lists set up for each project. > >> Those that just want to monitor the projects are welcome > >> as well. > > >> Cluster Infrastructure for Linux (CI) > >> The goal of this project is to develop a common > >> infrastructure for many if not all forms of Linux > >> clustering by extending the Cluster Membership and > >> Inter-node Communication Subsystems from Compaq's > >> NonStop Clusters for Unixware code base. This project > >> also provides the basis for the Open SSI Clusters for > >> Linux project. > >> A developers download is available via > >> www.opensource.compaq.com for Intel-32, along > >> with build, boot, hook, interface and api documentation. > >> We will put the CVS repository on the web when we can. > >> A port to the alpha chip has already succeeded and > >> patches for that are available. > > >> Open Single System Image (SSI) Clusters for Linux Project > >> The Open SSI project leverages both Compaq's NonStop > >> Clusters for Unixware technology and other open source > >> technology to provide a full, highly available SSI > >> environment for Linux. Goals for SSI Clusters include > >> availability, scalability and manageability, built from > >> standard servers. Technology pieces will include: > >> membership, single root and single init, cluster filesystems > >> and DLM, single process space and process migration, load > >> leveling, availability monitors and failover, single namespace > >> and shared access for all forms of IPC, devices and networking, > >> and a single management space. The SSI project will leverage > >> the Cluster Infrastructure for Linux project. > >> Source beyond the CI base is not yet available. We are > >> aiming for a developers release of much of functionality in > >> July. In the meantime there is a presentation on SSI > >> Clustering on the web. An initial list of component requirements > >> will soon be posted for discussion and refinement. > >> Join the mail alias via www.opensource.compaq.com > >> to stay updated. > > >> bruce walker > >> SSI Cluster Architect > >> Linux Program Office > >> Compaq Computers > > >> Linux-cluster: generic cluster infrastructure for Linux > >> Archive: http://mail.nl.linux.org/linux-cluster/ > > > > > > > > Linux-cluster: generic cluster infrastructure for Linux > Archive: http://mail.nl.linux.org/linux-cluster/ |