From: Nicholas H. <he...@se...> - 2003-06-25 22:12:36
|
On Wed, 2003-06-25 at 17:39, Gregory Shakhnarovich wrote: > Hi, > > We are working on our new BProc-based cluster. The initial setup has been > nice and smooth, but one major hole we still have is the scheduler/queuing > system (which is quite important for the intended use of the cluster). > > I am aware of Clubmask, but as people here have pointed out, the need to > do full node install makes that solution very undesirable. It looks like > the only (other) immediately available solution is BJS. So, I am trying to > figure out the following (and will appreciate any tips): <clubmask author> Not anymore. I am working on a release now that does away with Clubmask as an entire cluster installation/managment/feed your dog solution. I think it would be pretty easy to put Clubmask on a Clustermatic cluster, as clubmask is just a simple RPM now. The only requirements we have for the nodes are that they run a custom mond ( from supermon ), which can just be started from node_up. That said, the release is honestly a month off, as I have a ton of documentation to write, but the software itself is currently running & working fine on 3 separate clusters, and we are installing the rest of our clusters with it during July. I would be more than happy to try and get you running Clubmask on your Clustermatic setup, and I will be working with a Clustermatic cluster here in the near future. Feel free to email me if you would be willing to put in a bit of leg work. The issues I see cropping up are: 1) Need to patch kernel with a few symbol exports to make supermon happy. We can do without this, but you will not get the supermon2ganglia translator functionality. ( Supermon2ganglia is a 'fake' gmond that translates supermon data into ganglia XML so that you can view the data using the standard Ganglia web interface. See http://www.liniac.upenn.edu/ganglia for a live example. This would be pretty easy, as we have all of the SRPMs and patches that should be necessary. 2) Recompiling ZODB, IndexedCatalog, Clubmask, Python2.2.2, etc SRPMs for your target platform. Not really an issue, but it would need to be done. 3) Sanity checking -- well I guess this goes for any software. Now that I am done with the scary stuff :P, Here are a few questions for you: 1) Would you need ssh access or control to the nodes? 2) What platform would you be running on ? RH 9 ? 8 ? 3) timeframe ? Cheers! Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania |