ssic-linux-users Mailing List for OpenSSI Clusters for Linux (Page 7)
Brought to you by:
brucewalker,
rogertsang
You can subscribe to this list here.
| 2003 |
Jan
(17) |
Feb
(23) |
Mar
(32) |
Apr
(48) |
May
(51) |
Jun
(23) |
Jul
(39) |
Aug
(47) |
Sep
(107) |
Oct
(112) |
Nov
(112) |
Dec
(70) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2004 |
Jan
(155) |
Feb
(283) |
Mar
(200) |
Apr
(107) |
May
(73) |
Jun
(171) |
Jul
(127) |
Aug
(119) |
Sep
(91) |
Oct
(116) |
Nov
(175) |
Dec
(143) |
| 2005 |
Jan
(168) |
Feb
(237) |
Mar
(222) |
Apr
(183) |
May
(111) |
Jun
(153) |
Jul
(123) |
Aug
(43) |
Sep
(95) |
Oct
(179) |
Nov
(95) |
Dec
(119) |
| 2006 |
Jan
(39) |
Feb
(33) |
Mar
(133) |
Apr
(69) |
May
(22) |
Jun
(40) |
Jul
(33) |
Aug
(32) |
Sep
(34) |
Oct
(10) |
Nov
(8) |
Dec
(18) |
| 2007 |
Jan
(14) |
Feb
(3) |
Mar
(13) |
Apr
(16) |
May
(15) |
Jun
(8) |
Jul
(20) |
Aug
(25) |
Sep
(17) |
Oct
(10) |
Nov
(8) |
Dec
(13) |
| 2008 |
Jan
(7) |
Feb
|
Mar
(1) |
Apr
(6) |
May
(15) |
Jun
(22) |
Jul
(22) |
Aug
(5) |
Sep
(5) |
Oct
(17) |
Nov
(3) |
Dec
(1) |
| 2009 |
Jan
(2) |
Feb
|
Mar
(29) |
Apr
(78) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(1) |
Oct
(21) |
Nov
(1) |
Dec
(4) |
| 2010 |
Jan
(1) |
Feb
(5) |
Mar
|
Apr
(5) |
May
(7) |
Jun
(14) |
Jul
(5) |
Aug
(72) |
Sep
(25) |
Oct
(5) |
Nov
(14) |
Dec
(12) |
| 2011 |
Jan
(9) |
Feb
|
Mar
|
Apr
(3) |
May
(3) |
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(10) |
Aug
(18) |
Sep
(2) |
Oct
(1) |
Nov
|
Dec
|
| 2013 |
Jan
(1) |
Feb
(3) |
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
| 2014 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: John H. <jo...@Ca...> - 2010-08-11 11:17:07
|
Cumberland, Lonnie wrote: > Now I am looking at the disk drives that are located on each node for mapping into a single filespace so that all space appears to be a single drive to the users. You can't really do this. What you could do is have each node mount its disk on a different directory - possibly under /cluster/nodeX/ - then you'd have a whole mass of space available to all nodes. The downside would be that each node would then be a single point of failure. You could reduce the chances of failure by using pairs of nodes, setting up DRBD between pairs, then making failover CFS configurations. Better have a lot of bandwidth between the nodes. |
|
From: John H. <jo...@Ca...> - 2010-08-11 11:11:43
|
Some explanation of how OpenSSI interacts with filesystems and so on. When talking about using various filesystems with OpenSSI it's worth keeping two ideas in mind: 1. Cross node access 2. Failover. By "cross node access" I mean the ability for a process on one node to access files on another node. This is good for making the cluster look like one machine and necessary for migration of processes from one node to another. Cross node access can be made possible in two ways: 1. CFS - the I/O requests to a filesystem are handled on behalf on all nodes by a server node. 2. parallel mounts - the filesystem is mounted on all nodes. CFS is easy, the node that mounts the actual Linux filesystem stacks a CFS layer on top of it, all other nodes send their I/O requests to the CFS server. Parallel mounts needs a "physical" filesystem that can be accessed by multiple nodes. A simple example is NFS - each OpenSSI node directly accesses the remote NFS server. More complicated examples of parallel mounts are Lustre and so on. CFS Failover is necessary if we want to use CFS in a fault tolerant cluster. If the CFS server node goes down some other node has to take over its job. In order for this to work the other node needs to have physical access to the disks the filesystem was stored on - either by having used DRBD to make the data available to both the CFS primary and secondary nodes, or by actually having a physical path from both nodes to the disks (SAS, SCSI, iSCSI, FC or whatever). Also the filesystem under the CFS mount needs to be journaled, or the CFS failover will be forced to wait for a fsck before the filesystem is available again. It's failover that makes handling things like RAID and LVM exciting. Both the primary and secondary node need to access the RAID/LVM setup, but you they need to co-ordinate this access very carefully. for LVM there is CLVM (cluster LVM) which could probably be ported to OpenSSI. For RAID you'd need to modify OpenSSI to activate the RAID volumes on the secondary node during the failover. It should be possible, but it's not going to be an easy job. I'd really spend some time with the basic system, trying various failure scenarios, seeing how things work before taking on a big job like this. |
|
From: Mulyadi S. <mul...@gm...> - 2010-08-10 18:41:14
|
On Tue, Aug 10, 2010 at 23:52, Cumberland, Lonnie <lon...@ni...> wrote: > Hello John, and All, > > I am looking at various solutions and implementations to test and have come across MHDDDFS (which uses the FUSE libraries) as a stackable system on mount points: IIRC, plan 9 is recently merged with mainline kernel. However, OpenSSI is still based on old kernel so I am not sure if Plan 9 would go along with it. But, feel free to read: http://en.wikipedia.org/wiki/V9fs -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com |
|
From: Cumberland, L. <lon...@ni...> - 2010-08-10 16:52:45
|
Hello John, and All, I am looking at various solutions and implementations to test and have come across MHDDDFS (which uses the FUSE libraries) as a stackable system on mount points: http://romanrm.ru/en/mhddfs http://svn.uvw.ru/mhddfs/trunk/README I was trying to load it into the main server but seem to get errors and as being afraid of crashing the cluster, I thought that I would ask the list if anyone had some ideas? ------------------------------------------------- spartan:/mnt# apt-get install mhddfs Reading package lists... Done Building dependency tree Reading state information... Done The following extra packages will be installed: fuse-utils The following NEW packages will be installed: fuse-utils mhddfs 0 upgraded, 2 newly installed, 0 to remove and 2 not upgraded. Need to get 0B/38.2kB of archives. After this operation, 213kB of additional disk space will be used. Do you want to continue [Y/n]? WARNING: The following packages cannot be authenticated! fuse-utils mhddfs Authentication warning overridden. Selecting previously deselected package fuse-utils. (Reading database ... 102736 files and directories currently installed.) Unpacking fuse-utils (from .../fuse-utils_2.7.4-1.1+lenny1_i386.deb) ... Selecting previously deselected package mhddfs. Unpacking mhddfs (from .../mhddfs_0.1.12-1_i386.deb) ... Processing triggers for man-db ... Setting up fuse-utils (2.7.4-1.1+lenny1) ... creating fuse group... udev active, skipping device node creation. invoke-rc.d: WARNING: Service udev has no entry in rc.nodeinfo invoke-rc.d: Starting only on initnode Usage: /etc/init.d/udev {start|stop|restart|force-reload} invoke-rc.d: initscript udev, action "reload" failed. dpkg: error processing fuse-utils (--configure): subprocess post-installation script returned error exit status 1 dpkg: dependency problems prevent configuration of mhddfs: mhddfs depends on fuse-utils; however: Package fuse-utils is not configured yet. dpkg: error processing mhddfs (--configure): dependency problems - leaving unconfigured Errors were encountered while processing: fuse-utils mhddfs E: Sub-process /usr/bin/dpkg returned an error code (1) spartan:/mnt# ------------------------------------------------- I would like to load up the FUSE libraries unless the currently loaded OpenSSI stuff can already handle this idea in some way since it seems that I would have to install DRBD and do not know enough about the CFS to work with it heavily yet. Thanks and have a great day, Lonnie Cumberland, Prof. Physicist > -----Original Message----- > From: John Hughes [mailto:jo...@Ca...] > Sent: Tuesday, August 10, 2010 12:31 PM > To: Cumberland, Lonnie > Cc: Scott Walters; Openssi users > Subject: Re: [SSI-users] Debian Lenny OpenSSI with LVM and RAID 1 > > Cumberland, Lonnie wrote: > > Now I am looking at the disk drives that are located on each node for > mapping into a single filespace so that all space appears to be a > single drive to the users. For this, I was reading that OpenSSI appears > to be using DRBD (if I understand correctly) which allows it to mount > the filespace on each node. If this is so then I am guessing that > perhaps I could use LVM to bring all of the drives together into a > volume. > > > > If that is not possible then perhaps I am utilize a stackable files > system from FUSE like XtreemFS, gfs, or some other. Any ideas? > > > > I am also going to be adding RAID 1 to the system so that the main > drive has a complete mirror and failover incase the main drive goes > down then the secondary will pick up. > > > Your problem with linux raid and lvm wil be making sure that things get > failed over from node to node in the event of a node crash. > > This works: > > use DRDB to mirror between the nodes > use CFS to make the filesystem on the DRBD device available to all the > nodes. > > This also works: > > Use shared disk hardware (SAS, SCSI, Fibre Channel, whatever) to make > the disks available to all the nodes. > Use CFS on the disks to make the filesystem to all the nodes. > > This has worked in the past, I have no practical experience: > > Use a cluster aware filesystem (Lustre, even NFS) and mount the > filesystem on all the nodes. > |
|
From: John H. <jo...@Ca...> - 2010-08-10 16:31:12
|
Cumberland, Lonnie wrote: > Now I am looking at the disk drives that are located on each node for mapping into a single filespace so that all space appears to be a single drive to the users. For this, I was reading that OpenSSI appears to be using DRBD (if I understand correctly) which allows it to mount the filespace on each node. If this is so then I am guessing that perhaps I could use LVM to bring all of the drives together into a volume. > > If that is not possible then perhaps I am utilize a stackable files system from FUSE like XtreemFS, gfs, or some other. Any ideas? > > I am also going to be adding RAID 1 to the system so that the main drive has a complete mirror and failover incase the main drive goes down then the secondary will pick up. > Your problem with linux raid and lvm wil be making sure that things get failed over from node to node in the event of a node crash. This works: use DRDB to mirror between the nodes use CFS to make the filesystem on the DRBD device available to all the nodes. This also works: Use shared disk hardware (SAS, SCSI, Fibre Channel, whatever) to make the disks available to all the nodes. Use CFS on the disks to make the filesystem to all the nodes. This has worked in the past, I have no practical experience: Use a cluster aware filesystem (Lustre, even NFS) and mount the filesystem on all the nodes. |
|
From: Cumberland, L. <lon...@ni...> - 2010-08-10 13:57:15
|
Thanks to everyone for the wonderful information and I am making really good progress here at the moment. I have just modified my fstab to mount the swaps and drives for each node. ----------------------------------------------------- # /etc/fstab: static file system information. # # <file system> <mount point> <type> <options> <dump> <pass> proc /proc proc defaults,node=* 0 0 UUID=333de891-33d7-46eb-8aff-640abd6133fc / ext3 chard,errors=remount-ro,node=1 0 1 /dev/sda5 none swap sw,node=1 0 0 /dev/sda1 none swap sw,node=2 0 0 /dev/sda1 none swap sw,node=3 0 0 /dev/sda1 none swap sw,node=4 0 0 /dev/sda1 none swap sw,node=5 0 0 /dev/sda2 /etc/mnt/mnt02 ext3 rw,node=2 0 2 /dev/sda2 /etc/mnt/mnt03 ext3 rw,node=3 0 2 /dev/sda2 /etc/mnt/mnt04 ext3 rw,node=4 0 2 /dev/sda2 /etc/mnt/mnt05 ext3 rw,node=5 0 2 /dev/scd0 /media/cdrom0 udf,iso9660 user,noauto,node=1 0 0 /dev/fd0 /media/floppy0 auto rw,user,noauto,node=* 0 0 --------------------------------------------------- Also noticed that doing a "mount -a" from the head node does not seem to load up new fstab and mount the drives. I had to manually do: Onnode 2 mount /dev/sda2 /mnt/mnt02 For each node. Why is that? I will be moving things around a bit, but for now, just wanted to test out some things. Also, if I can mount things through the fstab on the head node then why do I need the command which is discussed in the documentation. Onnode 2 ./clusterfstab --nodenum=2 --mountpoint=<some mount point> Thanks and have a great day, Lonnie Cumberland, Prof. Physicist > -----Original Message----- > From: Scott Walters [mailto:sc...@sl...] > Sent: Tuesday, August 10, 2010 5:33 AM > To: Mulyadi Santosa > Cc: Openssi users > Subject: Re: [SSI-users] Debian Lenny OpenSSI with LVM and RAID 1 > > > @Lonnie: perhaps, just an idea, you would like to consider Dragonfly > > BSD too? AFAIK they implement distributed fs too. > > > > Mulyadi Santosa > > Dragonfly is still reworking kernel facilities and filesystem > primitives > to be distributed friendly. They have a good deal of work left. I'm > hoping for them. > > Harddrives in each machine to create one large filesystem might just > not > be realistic right now. A NAS design might be the way to go. > > -scott > > ----------------------------------------------------------------------- > ------- > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challenge > http://p.sf.net/sfu/RIM-dev2dev > _______________________________________________ > Ssic-linux-users mailing list > Ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users |
|
From: Scott W. <sc...@sl...> - 2010-08-10 09:33:10
|
> @Lonnie: perhaps, just an idea, you would like to consider Dragonfly > BSD too? AFAIK they implement distributed fs too. > > Mulyadi Santosa Dragonfly is still reworking kernel facilities and filesystem primitives to be distributed friendly. They have a good deal of work left. I'm hoping for them. Harddrives in each machine to create one large filesystem might just not be realistic right now. A NAS design might be the way to go. -scott |
|
From: Mulyadi S. <mul...@gm...> - 2010-08-10 09:09:46
|
Hi Greg Didn't realize you were here too...glad to know... On Tue, Aug 10, 2010 at 02:55, Greg Freemyer <gre...@gm...> wrote: > Hey Mulyadi, > > I think you're ahead of the current status of OpenSSI / Lustre support,: > > http://wiki.openssi.org/go/Lustre > > It's not clear to me if full Luster support it part of the roadmap for 1.9: > > http://wiki.openssi.org/go/Roadmap Whoops, sorry, seems like I didn't update myself properly. But anyway, distributed fs is really fascinating today. @Lonnie: perhaps, just an idea, you would like to consider Dragonfly BSD too? AFAIK they implement distributed fs too. -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com |
|
From: Greg F. <gre...@gm...> - 2010-08-09 19:55:59
|
On Mon, Aug 9, 2010 at 1:54 PM, Mulyadi Santosa <mul...@gm...> wrote: > > Hi... > > On Mon, Aug 9, 2010 at 22:13, Cumberland, Lonnie > <lon...@ni...> wrote: > > Now I am looking at the disk drives that are located on each node for mapping into a single filespace so that all space appears to be a single drive to the users. > > Sounds like the job for Lustre or PVFS... > Hey Mulyadi, I think you're ahead of the current status of OpenSSI / Lustre support,: http://wiki.openssi.org/go/Lustre It's not clear to me if full Luster support it part of the roadmap for 1.9: http://wiki.openssi.org/go/Roadmap AFAIK, CFS is your only real option currently if you want the storage to be managed inside the cluster. And then you can add drbd functionality for failover within the cluster. And neither CFS nor DRBD provide the requested funcitonality. In theory you can create a separate Lustre cluster to export storage to your openSSI cluster, but that's not the requested functionality either. fyi: I haven't built a openSSI cluster, but I follow it to some extent. I find the technology very interesting and way back when I worked Tru64 clusters which were similar. Good Luck Greg |
|
From: Mulyadi S. <mul...@gm...> - 2010-08-09 17:54:45
|
Hi... On Mon, Aug 9, 2010 at 22:13, Cumberland, Lonnie <lon...@ni...> wrote: > Now I am looking at the disk drives that are located on each node for mapping into a single filespace so that all space appears to be a single drive to the users. Sounds like the job for Lustre or PVFS... -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com |
|
From: Cumberland, L. <lon...@ni...> - 2010-08-09 15:13:58
|
Hello All, I hope that you are doing well today. I have abandoned the FreeNX idea over a much simpler one for our particular need. It uses XRDP which can be installed from apt-get (also from http://xrdp.sourceforge.net/) and as most our connection users will be coming from a Windows machine, this will make things easier. So far it works well, but I have to adjust the keymap still. Now I am looking at the disk drives that are located on each node for mapping into a single filespace so that all space appears to be a single drive to the users. For this, I was reading that OpenSSI appears to be using DRBD (if I understand correctly) which allows it to mount the filespace on each node. If this is so then I am guessing that perhaps I could use LVM to bring all of the drives together into a volume. If that is not possible then perhaps I am utilize a stackable files system from FUSE like XtreemFS, gfs, or some other. Any ideas? I am also going to be adding RAID 1 to the system so that the main drive has a complete mirror and failover incase the main drive goes down then the secondary will pick up. Thanks and have a great day, Lonnie Cumberland, Prof. Physicist National Institute of Standards and Technology Ionizing Radiation Division (846) Radiation Physics Group (245), Room C106 ADDRESS: 100 Bureau Drive, Stop 8462 Gaithersburg, MD 20899-8462 EMAIL: lon...@ni... http://physics.nist.gov/Divisions/Div846/div846.html |
|
From: Christopher H. <cha...@bp...> - 2010-08-06 19:48:11
|
The last time I ran a mosix based cluster I used the client for the SETI@home project as a demo. With a 10 node cluster you could start 10 instances of the client, and each process would migrate to an empty server. It was a good workout and a cool demo, but each process needed its own scratch space and it took a little work to set it up. Chris ----- "Lonnie Cumberland" <lon...@ni...> wrote: > Hello All, > > Are there any OpenSSI demo applications that can really show the > cluster in action? > > I would like to see and test performance on our cluster as we add more > and more nodes. > > Thanks and have a great day, > Lonnie Cumberland, Prof. > Physicist > > National Institute of Standards and Technology > Ionizing Radiation Division (846) > Radiation Physics Group (245), Room C106 > ADDRESS: > 100 Bureau Drive, Stop 8462 > Gaithersburg, MD 20899-8462 > > EMAIL: lon...@ni... > http://physics.nist.gov/Divisions/Div846/div846.html > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challenge > http://p.sf.net/sfu/RIM-dev2dev > _______________________________________________ > Ssic-linux-users mailing list > Ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users |
|
From: Scott W. <sc...@sl...> - 2010-08-06 18:46:49
|
Unix made it safe and easy to install multiple versions of applications and libraries. Then Debian came along and made it unsafe to install even one version of an application. Welcome to Debian. I don't have any suggestions for recovering what you had. Fixing apt muck-ups like that involves a lot of iterative fussing and has no assurance of success. A lot of people check /usr, the appropriate parts of /var, etc into version control so they can undo apt muck-ups. I just run full backups before attempting any apt operation. In the future, use the --no-upgrade flag as much as you can get away with. apt loves to try to upgrade the universe every time you want to install one simple app. That tells it not to. If you did want to try to muck around with it, I suggest getting on #debian on FreeNode IRC and asking for help. Here's a list of the versions (apt-cache showpkg output) for my packages matching xorg: http://slowass.net/~scott/tmp/xorg.versions.txt I'm not even going to talk about how to downgrade packages as I don't want to be responsibile. Doing anything with apt is playing with fire. Wish I had better news for ya. Regards, -scott On 0, "Cumberland, Lonnie" <lon...@ni...> wrote: > Hello All, > > I was finally able to get our cluster up and running this morning with 5 nodes online for testing. > > All was going well until I tried to install FreeNX server which seems to have replaced some libraries and my Xserver is no longer working from what I can tell. > > I used the Aptitude install freenx. > > Is there a way to roll-back those changes to get the Xserver back up and running as the cluster still seems to be active, but just no Xserver running. > > Thanks and have a great day, > Lonnie Cumberland, Prof. > Physicist > > National Institute of Standards and Technology > Ionizing Radiation Division (846) > Radiation Physics Group (245), Room C106 > ADDRESS: > 100 Bureau Drive, Stop 8462 > Gaithersburg, MD 20899-8462 > > EMAIL: lon...@ni... > http://physics.nist.gov/Divisions/Div846/div846.html > |
|
From: Scott W. <sc...@sl...> - 2010-08-06 18:35:29
|
People usually talk about clusters in terms of linpack ratings: http://www.top500.org/project/linpack http://www.top500.org/faq/where_can_i_get_software_test_my_system_top500 http://www.netlib.org/benchmark/hpl/ ... that implementation looks like it would require significant effort to adapt to anything. Still, it would be nice to have a simple process for running that on OpenSSI clusters. Oh, wait... Debian has a package: http://packages.debian.org/search?keywords=hpcc&searchon=names&suite=all§ion=all ... it just isn't in the sources (/etc/apt/sources.list) for Debian 5.0.4. The package seems to be too new, targeting the unstable and testing branches of Debian. It could probably be made to work or else compiled from source. That looks like that comes an MPI implementation. Running MPI on an SSI system feels a bit strange to me. Another option would be to fire up Rack or Plack or a similar web pipeline with preforked servers and load leveling on and run apachebench against it. That's more in line with what I'm doing. What software were you expecting to be run on the thing? Figuring out the software first and then the benchmark/demo secondarily might be more appropriate. Here are some popular favorites: http://www.nccs.gov/computing-resources/jaguar/software/ ... most of those aren't cluster aware/cluster specific. Also, here are Wiki pages on the topic specific to OpenSSI: http://wiki.openssi.org/go/Demos http://wiki.openssi.org/go/MySQL_Clustering http://wiki.openssi.org/go/Features # scroll down to Middleware Support and Server Support Most of the apps in that last list are less than exciting for demo purposes. High performance Squid proxy, anyone? Cheers, -scott On 0, "Cumberland, Lonnie" <lon...@ni...> wrote: > Hello All, > > Are there any OpenSSI demo applications that can really show the cluster in action? > > I would like to see and test performance on our cluster as we add more and more nodes. > > Thanks and have a great day, > Lonnie Cumberland, Prof. > Physicist > > National Institute of Standards and Technology > Ionizing Radiation Division (846) > Radiation Physics Group (245), Room C106 > ADDRESS: > 100 Bureau Drive, Stop 8462 > Gaithersburg, MD 20899-8462 > > EMAIL: lon...@ni... > http://physics.nist.gov/Divisions/Div846/div846.html > |
|
From: Cumberland, L. <lon...@ni...> - 2010-08-06 14:59:42
|
Hello All, I was finally able to get our cluster up and running this morning with 5 nodes online for testing. All was going well until I tried to install FreeNX server which seems to have replaced some libraries and my Xserver is no longer working from what I can tell. I used the Aptitude install freenx. Is there a way to roll-back those changes to get the Xserver back up and running as the cluster still seems to be active, but just no Xserver running. Thanks and have a great day, Lonnie Cumberland, Prof. Physicist National Institute of Standards and Technology Ionizing Radiation Division (846) Radiation Physics Group (245), Room C106 ADDRESS: 100 Bureau Drive, Stop 8462 Gaithersburg, MD 20899-8462 EMAIL: lon...@ni... http://physics.nist.gov/Divisions/Div846/div846.html |
|
From: Cumberland, L. <lon...@ni...> - 2010-08-06 13:29:50
|
Hello All, Are there any OpenSSI demo applications that can really show the cluster in action? I would like to see and test performance on our cluster as we add more and more nodes. Thanks and have a great day, Lonnie Cumberland, Prof. Physicist National Institute of Standards and Technology Ionizing Radiation Division (846) Radiation Physics Group (245), Room C106 ADDRESS: 100 Bureau Drive, Stop 8462 Gaithersburg, MD 20899-8462 EMAIL: lon...@ni... http://physics.nist.gov/Divisions/Div846/div846.html |
|
From: Cumberland, L. <lon...@ni...> - 2010-08-06 13:27:15
|
Greetings All, Can someone please tell me if there is a PDF document for OpenSSI since I came across the website documentation, but do not recall seeing a PDF as I would like to read it offline. Thanks and have a great day, Lonnie Cumberland, Prof. Physicist National Institute of Standards and Technology Ionizing Radiation Division (846) Radiation Physics Group (245), Room C106 ADDRESS: 100 Bureau Drive, Stop 8462 Gaithersburg, MD 20899-8462 EMAIL: lon...@ni... http://physics.nist.gov/Divisions/Div846/div846.html |
|
From: Cumberland, L. <lon...@ni...> - 2010-08-06 13:18:12
|
Hi John and Scott, I now succeed in figuring out an efficient process to bet more nodes up and running while now having 5 nodes up and operational. I do have one question about "ssi-addnode". In our physical design, I have a server rack with 30 nodes. 15 of the nodes are connected to 1 switch while the other 15 nodes are connected to a second switch and both switches are cascaded and mounted into the top of the rack. When I run the ssi-addnode command, it only allow me to add a maximum of 15 nodes and am suspecting that it is not seeing beyond the first switch for some reason. How can I go about making sure that the ssi-addnode can see the second switch? Thanks and have a great day, Lonnie Cumberland, Prof. Physicist National Institute of Standards and Technology Ionizing Radiation Division (846) Radiation Physics Group (245), Room C106 ADDRESS: 100 Bureau Drive, Stop 8462 Gaithersburg, MD 20899-8462 EMAIL: lon...@ni... http://physics.nist.gov/Divisions/Div846/div846.html > -----Original Message----- > From: John Hughes [mailto:jo...@Ca...] > Sent: Friday, August 06, 2010 5:10 AM > To: Cumberland, Lonnie > Cc: Scott Walters; Openssi users > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 kernel] > from source > > Cumberland, Lonnie wrote: > > Thanks for getting back to me Scott, > > > > Just tried your suggestion and am still getting the exact same > messages to where Node 2 starts the kernel and then go to the point of > halting like before. > > > > Not sure what is happening here. > > > > Do you think that I should have to start all over again with a fresh > install of Debian "Lenny" on Node 1 and to a complete re-install? > > > > Node 1 seems to be working from what I can tell and even the "cluster > -v" shows it as "UP", but no luck with getting Node 2 all of the way up > yet. > > > > Wait, node1 is up? > > With the e1000 driver and all? > > It shows up in the output of "ifconfig all"? |
|
From: John H. <jo...@Ca...> - 2010-08-06 08:48:46
|
Cumberland, Lonnie wrote: > Thanks for getting back to me Scott, > > Just tried your suggestion and am still getting the exact same messages to where Node 2 starts the kernel and then go to the point of halting like before. > > Not sure what is happening here. > > Do you think that I should have to start all over again with a fresh install of Debian "Lenny" on Node 1 and to a complete re-install? > > Node 1 seems to be working from what I can tell and even the "cluster -v" shows it as "UP", but no luck with getting Node 2 all of the way up yet. > Wait, node1 is up? With the e1000 driver and all? It shows up in the output of "ifconfig all"? |
|
From: John H. <jo...@Ca...> - 2010-08-06 08:46:44
|
Cumberland, Lonnie wrote: > What's strange is that the e1000 driver is included in the ramdisk image as I mounted it to take a look. > Ok, this is better. I could not understand why on earth the e1000 driver would not be in the ramdisk (other than by user error). So now it's looking like the version of the e1000 driver we have doesn't understand your card. Could you please try dropping into the shell and seeing if: 1. the e1000 module is loaded into memory 2. does it show up in ifconfig 3, what does dmesg show. |
|
From: Scott W. <sc...@sl...> - 2010-08-05 20:00:29
|
Hi Lonnie, Ahhh, very good to hear! I'm experimenting with software RAID on one of the nodes, thinking of standardizing each worker node as having a 1TB RAID 1, but I haven't gotten past the point of manually initializing and mounting the md. So, I need to go research the same thing. Manually mounting partitions cluster-wide is fun and easy. Getting additional nodes booted into the cluster really is the hard part. Everything is much nicer after that point. One thing that's been biting me is it seems like when a node goes down, all of the unix-domain sockets on the clusterfs vanish. No one can get to their screen session anymore and databases are inaccessible. There's the clusterview web app available from OpenSSI's page on sourceforge but so far I've just been using top and ps, both of which are cluster-aware. It's also easy to incorporate 'onall' into shell commands such as with: onall w | grep average > I REALLY appreciate yours and John's help to get the cluster up and running. > Gaithersburg, MD 20899-8462 I find myself on that side of the continent, I'll drop you an email and ask you about the possibility of a tour of the facility, or alternatively, coffee. Regardless, you're quite welcome. Cheers, -scott On 0, "Cumberland, Lonnie" <lon...@ni...> wrote: > Hi Scott, > > Looks like SUCCESS!!!!! > > The re-run of the mkinitrd seems to have done the trick. > > Now when I do a "cluster -v" it shows Nodes 1 & 2 as UP. > > I need to locate the documentation for the cluster tools since I would like to see where it shows all of the cluster total CPU's and RAM. > > For my next challenge, I have 80Gig drives on each node (each with a swap and main empty partition) and want to add in the swap space for each node while also adding in the empty main partitions for each node into the collective cluster space. > > Do you know how to do the above mapping in of each node drive partition and swaps space? > > I will keep each node as a PXE bootable node and not have local booting for each node for this current cluster. > > I REALLY appreciate yours and John's help to get the cluster up and running. > > Thanks and have a great day, > Lonnie Cumberland, Prof. > Physicist > > (301) 975-6869 (Office) > (313) 333-2935 (Cell) > (301) 926-7416 (Fax) > > National Institute of Standards and Technology > Ionizing Radiation Division (846) > Radiation Physics Group (245), Room C106 > ADDRESS: > 100 Bureau Drive, Stop 8462 > Gaithersburg, MD 20899-8462 > > EMAIL: lon...@ni... > http://physics.nist.gov/Divisions/Div846/div846.html > > > > > -----Original Message----- > > From: Scott Walters [mailto:sc...@sl...] > > Sent: Thursday, August 05, 2010 1:55 PM > > To: Cumberland, Lonnie > > Cc: John Hughes; Openssi users > > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 kernel] > > from source > > > > I'd do one more thing first... > > > > mkinitrd -o /boot/initrd.img-2.6.14-ssi-686-smp 2.6.14-ssi-686-smp > > > > ... and then the ssi-ksync (making sure that new boot materials are > > placed into /tftpboot). > > > > Cheers, > > -scott > > > > On 0, "Cumberland, Lonnie" <lon...@ni...> wrote: > > > Thanks for getting back to me Scott, > > > > > > Just tried your suggestion and am still getting the exact same > > messages to where Node 2 starts the kernel and then go to the point of > > halting like before. > > > > > > Not sure what is happening here. > > > > > > Do you think that I should have to start all over again with a fresh > > install of Debian "Lenny" on Node 1 and to a complete re-install? > > > > > > Node 1 seems to be working from what I can tell and even the "cluster > > -v" shows it as "UP", but no luck with getting Node 2 all of the way up > > yet. > > > > > > Thanks and have a great day, > > > Lonnie Cumberland, Prof. > > > Physicist > > > > > > (301) 975-6869 (Office) > > > (313) 333-2935 (Cell) > > > (301) 926-7416 (Fax) > > > > > > National Institute of Standards and Technology > > > Ionizing Radiation Division (846) > > > Radiation Physics Group (245), Room C106 > > > ADDRESS: > > > 100 Bureau Drive, Stop 8462 > > > Gaithersburg, MD 20899-8462 > > > > > > EMAIL: lon...@ni... > > > http://physics.nist.gov/Divisions/Div846/div846.html > > > > > > > > > > > > > -----Original Message----- > > > > From: Scott Walters [mailto:sc...@sl...] > > > > Sent: Thursday, August 05, 2010 1:14 PM > > > > To: Cumberland, Lonnie > > > > Cc: John Hughes; Openssi users > > > > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 > > kernel] > > > > from source > > > > > > > > Shouldn't have to explicitly add dhcp. > > > > > > > > It sounds to me -- taking a guess here -- that your hand-rolled > > > > mkinitrd has > > > > survived and has never been replaced by OpenSSI's. Node 2 gets the > > > > right kernel > > > > but the wrong initrd. > > > > > > > > Remove /tftpboot/kernel and initrd first just to make sure that > > they're > > > > being rebuilt and replaced with fresh copies. ssi-ksync-network is > > a > > > > shell > > > > script and error reporting is never great in shell scripts. I > > remember > > > > having to tweak things a bit in my work of building and installing > > > > a patched kernel. The initnode happily booted the patched kernel > > > > but then the other nodes came up on the old, original kernel until > > > > I cleaned out that directory and muddled with things until I could > > get > > > > ssi-ksync-network to go. ssi-ksync calls ssi-ksync-network. If > > stuff > > > > in /tftpboot isn't rebuilt, step through the script one line at a > > time > > > > (perhaps just run commands at the prompt) and make sure nothing > > errors > > > > out or comes up with null data where it shouldn't. Though honestly > > > > my problems were probably related to not using the prescribed > > > > bootloader =) > > > > > > > > Cheers, > > > > -scott > > > > > > > > > > > > > > > > > > > > On 0, "Cumberland, Lonnie" <lon...@ni...> wrote: > > > > > Hello All, > > > > > > > > > > I just found the "Node Hang at boot" over at: > > > > > > > > > > http://wiki.openssi.org/go/Debian > > > > > > > > > > which describes part of my problem but even though I tried it as > > well > > > > as the solution in the "Node hang at boot, variant 2", I still get > > the > > > > same message. > > > > > > > > > > What's strange is that the e1000 driver is included in the > > ramdisk > > > > image as I mounted it to take a look. > > > > > > > > > > I also, added the /sbin/dhclient to the /etc/mkinitrd/exe before > > I > > > > used "ssi-ksync" to rebuild the ramdisk image. > > > > > > > > > > No luck so far on getting Node 2 to complete the booting even > > though > > > > it receives the kernel from the tftpd server on Node 1. > > > > > > > > > > Any ideas? > > > > > > > > > > > > > > > Thanks and have a great day, > > > > > Lonnie Cumberland, Prof. > > > > > Physicist > > > > > > > > > > National Institute of Standards and Technology > > > > > Ionizing Radiation Division (846) > > > > > Radiation Physics Group (245), Room C106 > > > > > ADDRESS: > > > > > 100 Bureau Drive, Stop 8462 > > > > > Gaithersburg, MD 20899-8462 > > > > > > > > > > EMAIL: lon...@ni... > > > > > http://physics.nist.gov/Divisions/Div846/div846.html > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Cumberland, Lonnie [mailto:lon...@ni...] > > > > > > Sent: Thursday, August 05, 2010 8:42 AM > > > > > > To: John Hughes > > > > > > Cc: Openssi users > > > > > > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 > > > > kernel] > > > > > > from source > > > > > > > > > > > > Greetings All, > > > > > > > > > > > > I just tried to use "ssi-ksync" on the main node to sync up > > things > > > > for > > > > > > the ramimage that is being sent over via PXE boot to the nodes, > > but > > > > it > > > > > > seems that the dhcp client is missing from the ram image. > > > > > > > > > > > > The message that I am getting on node 2 is: > > > > > > > > > > > > "Gathering Cluster info > > > > > > > > > > > > DHCP client application not found > > > > > > > > > > > > Add dhcp client application to /etc/mkinitrd/exe and rebuild > > > > ramdisk > > > > > > image > > > > > > > > > > > > ERROR: Could not find a NIC with node configuration. Halting." > > > > > > > > > > > > I think that I need to edit the ramdisk so that it has the dhck > > > > client > > > > > > application, but am not sure how to do that. > > > > > > > > > > > > Any advice would be greatly appreciated. > > > > > > > > > > > > Thanks and have a great day, > > > > > > Lonnie Cumberland, Prof. > > > > > > Physicist > > > > > > > > > > > > National Institute of Standards and Technology > > > > > > Ionizing Radiation Division (846) > > > > > > Radiation Physics Group (245), Room C106 > > > > > > ADDRESS: > > > > > > 100 Bureau Drive, Stop 8462 > > > > > > Gaithersburg, MD 20899-8462 > > > > > > > > > > > > EMAIL: lon...@ni... > > > > > > http://physics.nist.gov/Divisions/Div846/div846.html > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: John Hughes [mailto:jo...@Ca...] > > > > > > > Sent: Thursday, August 05, 2010 5:19 AM > > > > > > > To: Cumberland, Lonnie > > > > > > > Cc: Scott Walters; Openssi users > > > > > > > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 > > > > kernel] > > > > > > > from source > > > > > > > > > > > > > > Cumberland, Lonnie wrote: > > > > > > > > I have to ask what exact procedure you or others have used > > and > > > > if > > > > > > you > > > > > > > have the log files from an install that works since I am at a > > > > loss as > > > > > > > to why this procedure is not working as it seems like it > > should. > > > > > > > > > > > > > > > The log files are the exact procedure I followed. > > > > > > > > > > > > > > I'm sorry that I haven't been able to find any time to work > > on > > > > this > > > > > > for > > > > > > > the moment, I'll try again this weekend. > > > > > > > > > > > > > > (Current tasks, not necessarily in priority order: > > > > > > > > > > > > > > Port a huge application from SCO UnixWare to Debian Linux > > > > > > > Finish remodeling daughters bedroom > > > > > > > Replace broken fridge > > > > > > > Clean up mess left after cutting down tree > > > > > > > Work on OpenSSI > > > > > > > Hobby project - Porting software to obsolete ICL mainframe > > > > computer. > > > > > > > (I > > > > > > > don't have enough space for a model railway set)). > > > > > > > > > > > > > > > > > > --------------------------------------------------------------- > > ---- > > > > ---- > > > > > > ------- > > > > > > The Palm PDK Hot Apps Program offers developers who use the > > > > > > Plug-In Development Kit to bring their C/C++ apps to Palm for a > > > > share > > > > > > of $1 Million in cash or HP Products. Visit us here for more > > > > details: > > > > > > http://p.sf.net/sfu/dev2dev-palm > > > > > > _______________________________________________ > > > > > > Ssic-linux-users mailing list > > > > > > Ssi...@li... > > > > > > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users > > > > > > > > > > ----------------------------------------------------------------- > > ---- > > > > --------- > > > > > The Palm PDK Hot Apps Program offers developers who use the > > > > > Plug-In Development Kit to bring their C/C++ apps to Palm for a > > share > > > > > of $1 Million in cash or HP Products. Visit us here for more > > details: > > > > > http://p.sf.net/sfu/dev2dev-palm > > > > > _______________________________________________ > > > > > Ssic-linux-users mailing list > > > > > Ssi...@li... > > > > > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users |
|
From: Cumberland, L. <lon...@ni...> - 2010-08-05 18:49:35
|
Hi Scott, Looks like SUCCESS!!!!! The re-run of the mkinitrd seems to have done the trick. Now when I do a "cluster -v" it shows Nodes 1 & 2 as UP. I need to locate the documentation for the cluster tools since I would like to see where it shows all of the cluster total CPU's and RAM. For my next challenge, I have 80Gig drives on each node (each with a swap and main empty partition) and want to add in the swap space for each node while also adding in the empty main partitions for each node into the collective cluster space. Do you know how to do the above mapping in of each node drive partition and swaps space? I will keep each node as a PXE bootable node and not have local booting for each node for this current cluster. I REALLY appreciate yours and John's help to get the cluster up and running. Thanks and have a great day, Lonnie Cumberland, Prof. Physicist (301) 975-6869 (Office) (313) 333-2935 (Cell) (301) 926-7416 (Fax) National Institute of Standards and Technology Ionizing Radiation Division (846) Radiation Physics Group (245), Room C106 ADDRESS: 100 Bureau Drive, Stop 8462 Gaithersburg, MD 20899-8462 EMAIL: lon...@ni... http://physics.nist.gov/Divisions/Div846/div846.html > -----Original Message----- > From: Scott Walters [mailto:sc...@sl...] > Sent: Thursday, August 05, 2010 1:55 PM > To: Cumberland, Lonnie > Cc: John Hughes; Openssi users > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 kernel] > from source > > I'd do one more thing first... > > mkinitrd -o /boot/initrd.img-2.6.14-ssi-686-smp 2.6.14-ssi-686-smp > > ... and then the ssi-ksync (making sure that new boot materials are > placed into /tftpboot). > > Cheers, > -scott > > On 0, "Cumberland, Lonnie" <lon...@ni...> wrote: > > Thanks for getting back to me Scott, > > > > Just tried your suggestion and am still getting the exact same > messages to where Node 2 starts the kernel and then go to the point of > halting like before. > > > > Not sure what is happening here. > > > > Do you think that I should have to start all over again with a fresh > install of Debian "Lenny" on Node 1 and to a complete re-install? > > > > Node 1 seems to be working from what I can tell and even the "cluster > -v" shows it as "UP", but no luck with getting Node 2 all of the way up > yet. > > > > Thanks and have a great day, > > Lonnie Cumberland, Prof. > > Physicist > > > > (301) 975-6869 (Office) > > (313) 333-2935 (Cell) > > (301) 926-7416 (Fax) > > > > National Institute of Standards and Technology > > Ionizing Radiation Division (846) > > Radiation Physics Group (245), Room C106 > > ADDRESS: > > 100 Bureau Drive, Stop 8462 > > Gaithersburg, MD 20899-8462 > > > > EMAIL: lon...@ni... > > http://physics.nist.gov/Divisions/Div846/div846.html > > > > > > > > > -----Original Message----- > > > From: Scott Walters [mailto:sc...@sl...] > > > Sent: Thursday, August 05, 2010 1:14 PM > > > To: Cumberland, Lonnie > > > Cc: John Hughes; Openssi users > > > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 > kernel] > > > from source > > > > > > Shouldn't have to explicitly add dhcp. > > > > > > It sounds to me -- taking a guess here -- that your hand-rolled > > > mkinitrd has > > > survived and has never been replaced by OpenSSI's. Node 2 gets the > > > right kernel > > > but the wrong initrd. > > > > > > Remove /tftpboot/kernel and initrd first just to make sure that > they're > > > being rebuilt and replaced with fresh copies. ssi-ksync-network is > a > > > shell > > > script and error reporting is never great in shell scripts. I > remember > > > having to tweak things a bit in my work of building and installing > > > a patched kernel. The initnode happily booted the patched kernel > > > but then the other nodes came up on the old, original kernel until > > > I cleaned out that directory and muddled with things until I could > get > > > ssi-ksync-network to go. ssi-ksync calls ssi-ksync-network. If > stuff > > > in /tftpboot isn't rebuilt, step through the script one line at a > time > > > (perhaps just run commands at the prompt) and make sure nothing > errors > > > out or comes up with null data where it shouldn't. Though honestly > > > my problems were probably related to not using the prescribed > > > bootloader =) > > > > > > Cheers, > > > -scott > > > > > > > > > > > > > > > On 0, "Cumberland, Lonnie" <lon...@ni...> wrote: > > > > Hello All, > > > > > > > > I just found the "Node Hang at boot" over at: > > > > > > > > http://wiki.openssi.org/go/Debian > > > > > > > > which describes part of my problem but even though I tried it as > well > > > as the solution in the "Node hang at boot, variant 2", I still get > the > > > same message. > > > > > > > > What's strange is that the e1000 driver is included in the > ramdisk > > > image as I mounted it to take a look. > > > > > > > > I also, added the /sbin/dhclient to the /etc/mkinitrd/exe before > I > > > used "ssi-ksync" to rebuild the ramdisk image. > > > > > > > > No luck so far on getting Node 2 to complete the booting even > though > > > it receives the kernel from the tftpd server on Node 1. > > > > > > > > Any ideas? > > > > > > > > > > > > Thanks and have a great day, > > > > Lonnie Cumberland, Prof. > > > > Physicist > > > > > > > > National Institute of Standards and Technology > > > > Ionizing Radiation Division (846) > > > > Radiation Physics Group (245), Room C106 > > > > ADDRESS: > > > > 100 Bureau Drive, Stop 8462 > > > > Gaithersburg, MD 20899-8462 > > > > > > > > EMAIL: lon...@ni... > > > > http://physics.nist.gov/Divisions/Div846/div846.html > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Cumberland, Lonnie [mailto:lon...@ni...] > > > > > Sent: Thursday, August 05, 2010 8:42 AM > > > > > To: John Hughes > > > > > Cc: Openssi users > > > > > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 > > > kernel] > > > > > from source > > > > > > > > > > Greetings All, > > > > > > > > > > I just tried to use "ssi-ksync" on the main node to sync up > things > > > for > > > > > the ramimage that is being sent over via PXE boot to the nodes, > but > > > it > > > > > seems that the dhcp client is missing from the ram image. > > > > > > > > > > The message that I am getting on node 2 is: > > > > > > > > > > "Gathering Cluster info > > > > > > > > > > DHCP client application not found > > > > > > > > > > Add dhcp client application to /etc/mkinitrd/exe and rebuild > > > ramdisk > > > > > image > > > > > > > > > > ERROR: Could not find a NIC with node configuration. Halting." > > > > > > > > > > I think that I need to edit the ramdisk so that it has the dhck > > > client > > > > > application, but am not sure how to do that. > > > > > > > > > > Any advice would be greatly appreciated. > > > > > > > > > > Thanks and have a great day, > > > > > Lonnie Cumberland, Prof. > > > > > Physicist > > > > > > > > > > National Institute of Standards and Technology > > > > > Ionizing Radiation Division (846) > > > > > Radiation Physics Group (245), Room C106 > > > > > ADDRESS: > > > > > 100 Bureau Drive, Stop 8462 > > > > > Gaithersburg, MD 20899-8462 > > > > > > > > > > EMAIL: lon...@ni... > > > > > http://physics.nist.gov/Divisions/Div846/div846.html > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: John Hughes [mailto:jo...@Ca...] > > > > > > Sent: Thursday, August 05, 2010 5:19 AM > > > > > > To: Cumberland, Lonnie > > > > > > Cc: Scott Walters; Openssi users > > > > > > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 > > > kernel] > > > > > > from source > > > > > > > > > > > > Cumberland, Lonnie wrote: > > > > > > > I have to ask what exact procedure you or others have used > and > > > if > > > > > you > > > > > > have the log files from an install that works since I am at a > > > loss as > > > > > > to why this procedure is not working as it seems like it > should. > > > > > > > > > > > > > The log files are the exact procedure I followed. > > > > > > > > > > > > I'm sorry that I haven't been able to find any time to work > on > > > this > > > > > for > > > > > > the moment, I'll try again this weekend. > > > > > > > > > > > > (Current tasks, not necessarily in priority order: > > > > > > > > > > > > Port a huge application from SCO UnixWare to Debian Linux > > > > > > Finish remodeling daughters bedroom > > > > > > Replace broken fridge > > > > > > Clean up mess left after cutting down tree > > > > > > Work on OpenSSI > > > > > > Hobby project - Porting software to obsolete ICL mainframe > > > computer. > > > > > > (I > > > > > > don't have enough space for a model railway set)). > > > > > > > > > > > > > > > --------------------------------------------------------------- > ---- > > > ---- > > > > > ------- > > > > > The Palm PDK Hot Apps Program offers developers who use the > > > > > Plug-In Development Kit to bring their C/C++ apps to Palm for a > > > share > > > > > of $1 Million in cash or HP Products. Visit us here for more > > > details: > > > > > http://p.sf.net/sfu/dev2dev-palm > > > > > _______________________________________________ > > > > > Ssic-linux-users mailing list > > > > > Ssi...@li... > > > > > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users > > > > > > > > ----------------------------------------------------------------- > ---- > > > --------- > > > > The Palm PDK Hot Apps Program offers developers who use the > > > > Plug-In Development Kit to bring their C/C++ apps to Palm for a > share > > > > of $1 Million in cash or HP Products. Visit us here for more > details: > > > > http://p.sf.net/sfu/dev2dev-palm > > > > _______________________________________________ > > > > Ssic-linux-users mailing list > > > > Ssi...@li... > > > > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users |
|
From: Scott W. <sc...@sl...> - 2010-08-05 17:55:11
|
I'd do one more thing first... mkinitrd -o /boot/initrd.img-2.6.14-ssi-686-smp 2.6.14-ssi-686-smp ... and then the ssi-ksync (making sure that new boot materials are placed into /tftpboot). Cheers, -scott On 0, "Cumberland, Lonnie" <lon...@ni...> wrote: > Thanks for getting back to me Scott, > > Just tried your suggestion and am still getting the exact same messages to where Node 2 starts the kernel and then go to the point of halting like before. > > Not sure what is happening here. > > Do you think that I should have to start all over again with a fresh install of Debian "Lenny" on Node 1 and to a complete re-install? > > Node 1 seems to be working from what I can tell and even the "cluster -v" shows it as "UP", but no luck with getting Node 2 all of the way up yet. > > Thanks and have a great day, > Lonnie Cumberland, Prof. > Physicist > > (301) 975-6869 (Office) > (313) 333-2935 (Cell) > (301) 926-7416 (Fax) > > National Institute of Standards and Technology > Ionizing Radiation Division (846) > Radiation Physics Group (245), Room C106 > ADDRESS: > 100 Bureau Drive, Stop 8462 > Gaithersburg, MD 20899-8462 > > EMAIL: lon...@ni... > http://physics.nist.gov/Divisions/Div846/div846.html > > > > > -----Original Message----- > > From: Scott Walters [mailto:sc...@sl...] > > Sent: Thursday, August 05, 2010 1:14 PM > > To: Cumberland, Lonnie > > Cc: John Hughes; Openssi users > > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 kernel] > > from source > > > > Shouldn't have to explicitly add dhcp. > > > > It sounds to me -- taking a guess here -- that your hand-rolled > > mkinitrd has > > survived and has never been replaced by OpenSSI's. Node 2 gets the > > right kernel > > but the wrong initrd. > > > > Remove /tftpboot/kernel and initrd first just to make sure that they're > > being rebuilt and replaced with fresh copies. ssi-ksync-network is a > > shell > > script and error reporting is never great in shell scripts. I remember > > having to tweak things a bit in my work of building and installing > > a patched kernel. The initnode happily booted the patched kernel > > but then the other nodes came up on the old, original kernel until > > I cleaned out that directory and muddled with things until I could get > > ssi-ksync-network to go. ssi-ksync calls ssi-ksync-network. If stuff > > in /tftpboot isn't rebuilt, step through the script one line at a time > > (perhaps just run commands at the prompt) and make sure nothing errors > > out or comes up with null data where it shouldn't. Though honestly > > my problems were probably related to not using the prescribed > > bootloader =) > > > > Cheers, > > -scott > > > > > > > > > > On 0, "Cumberland, Lonnie" <lon...@ni...> wrote: > > > Hello All, > > > > > > I just found the "Node Hang at boot" over at: > > > > > > http://wiki.openssi.org/go/Debian > > > > > > which describes part of my problem but even though I tried it as well > > as the solution in the "Node hang at boot, variant 2", I still get the > > same message. > > > > > > What's strange is that the e1000 driver is included in the ramdisk > > image as I mounted it to take a look. > > > > > > I also, added the /sbin/dhclient to the /etc/mkinitrd/exe before I > > used "ssi-ksync" to rebuild the ramdisk image. > > > > > > No luck so far on getting Node 2 to complete the booting even though > > it receives the kernel from the tftpd server on Node 1. > > > > > > Any ideas? > > > > > > > > > Thanks and have a great day, > > > Lonnie Cumberland, Prof. > > > Physicist > > > > > > National Institute of Standards and Technology > > > Ionizing Radiation Division (846) > > > Radiation Physics Group (245), Room C106 > > > ADDRESS: > > > 100 Bureau Drive, Stop 8462 > > > Gaithersburg, MD 20899-8462 > > > > > > EMAIL: lon...@ni... > > > http://physics.nist.gov/Divisions/Div846/div846.html > > > > > > > > > > > > > -----Original Message----- > > > > From: Cumberland, Lonnie [mailto:lon...@ni...] > > > > Sent: Thursday, August 05, 2010 8:42 AM > > > > To: John Hughes > > > > Cc: Openssi users > > > > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 > > kernel] > > > > from source > > > > > > > > Greetings All, > > > > > > > > I just tried to use "ssi-ksync" on the main node to sync up things > > for > > > > the ramimage that is being sent over via PXE boot to the nodes, but > > it > > > > seems that the dhcp client is missing from the ram image. > > > > > > > > The message that I am getting on node 2 is: > > > > > > > > "Gathering Cluster info > > > > > > > > DHCP client application not found > > > > > > > > Add dhcp client application to /etc/mkinitrd/exe and rebuild > > ramdisk > > > > image > > > > > > > > ERROR: Could not find a NIC with node configuration. Halting." > > > > > > > > I think that I need to edit the ramdisk so that it has the dhck > > client > > > > application, but am not sure how to do that. > > > > > > > > Any advice would be greatly appreciated. > > > > > > > > Thanks and have a great day, > > > > Lonnie Cumberland, Prof. > > > > Physicist > > > > > > > > National Institute of Standards and Technology > > > > Ionizing Radiation Division (846) > > > > Radiation Physics Group (245), Room C106 > > > > ADDRESS: > > > > 100 Bureau Drive, Stop 8462 > > > > Gaithersburg, MD 20899-8462 > > > > > > > > EMAIL: lon...@ni... > > > > http://physics.nist.gov/Divisions/Div846/div846.html > > > > > > > > > > > > > -----Original Message----- > > > > > From: John Hughes [mailto:jo...@Ca...] > > > > > Sent: Thursday, August 05, 2010 5:19 AM > > > > > To: Cumberland, Lonnie > > > > > Cc: Scott Walters; Openssi users > > > > > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 > > kernel] > > > > > from source > > > > > > > > > > Cumberland, Lonnie wrote: > > > > > > I have to ask what exact procedure you or others have used and > > if > > > > you > > > > > have the log files from an install that works since I am at a > > loss as > > > > > to why this procedure is not working as it seems like it should. > > > > > > > > > > > The log files are the exact procedure I followed. > > > > > > > > > > I'm sorry that I haven't been able to find any time to work on > > this > > > > for > > > > > the moment, I'll try again this weekend. > > > > > > > > > > (Current tasks, not necessarily in priority order: > > > > > > > > > > Port a huge application from SCO UnixWare to Debian Linux > > > > > Finish remodeling daughters bedroom > > > > > Replace broken fridge > > > > > Clean up mess left after cutting down tree > > > > > Work on OpenSSI > > > > > Hobby project - Porting software to obsolete ICL mainframe > > computer. > > > > > (I > > > > > don't have enough space for a model railway set)). > > > > > > > > > > > > ------------------------------------------------------------------- > > ---- > > > > ------- > > > > The Palm PDK Hot Apps Program offers developers who use the > > > > Plug-In Development Kit to bring their C/C++ apps to Palm for a > > share > > > > of $1 Million in cash or HP Products. Visit us here for more > > details: > > > > http://p.sf.net/sfu/dev2dev-palm > > > > _______________________________________________ > > > > Ssic-linux-users mailing list > > > > Ssi...@li... > > > > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users > > > > > > --------------------------------------------------------------------- > > --------- > > > The Palm PDK Hot Apps Program offers developers who use the > > > Plug-In Development Kit to bring their C/C++ apps to Palm for a share > > > of $1 Million in cash or HP Products. Visit us here for more details: > > > http://p.sf.net/sfu/dev2dev-palm > > > _______________________________________________ > > > Ssic-linux-users mailing list > > > Ssi...@li... > > > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users |
|
From: Cumberland, L. <lon...@ni...> - 2010-08-05 17:37:21
|
Thanks for getting back to me Scott, Just tried your suggestion and am still getting the exact same messages to where Node 2 starts the kernel and then go to the point of halting like before. Not sure what is happening here. Do you think that I should have to start all over again with a fresh install of Debian "Lenny" on Node 1 and to a complete re-install? Node 1 seems to be working from what I can tell and even the "cluster -v" shows it as "UP", but no luck with getting Node 2 all of the way up yet. Thanks and have a great day, Lonnie Cumberland, Prof. Physicist (301) 975-6869 (Office) (313) 333-2935 (Cell) (301) 926-7416 (Fax) National Institute of Standards and Technology Ionizing Radiation Division (846) Radiation Physics Group (245), Room C106 ADDRESS: 100 Bureau Drive, Stop 8462 Gaithersburg, MD 20899-8462 EMAIL: lon...@ni... http://physics.nist.gov/Divisions/Div846/div846.html > -----Original Message----- > From: Scott Walters [mailto:sc...@sl...] > Sent: Thursday, August 05, 2010 1:14 PM > To: Cumberland, Lonnie > Cc: John Hughes; Openssi users > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 kernel] > from source > > Shouldn't have to explicitly add dhcp. > > It sounds to me -- taking a guess here -- that your hand-rolled > mkinitrd has > survived and has never been replaced by OpenSSI's. Node 2 gets the > right kernel > but the wrong initrd. > > Remove /tftpboot/kernel and initrd first just to make sure that they're > being rebuilt and replaced with fresh copies. ssi-ksync-network is a > shell > script and error reporting is never great in shell scripts. I remember > having to tweak things a bit in my work of building and installing > a patched kernel. The initnode happily booted the patched kernel > but then the other nodes came up on the old, original kernel until > I cleaned out that directory and muddled with things until I could get > ssi-ksync-network to go. ssi-ksync calls ssi-ksync-network. If stuff > in /tftpboot isn't rebuilt, step through the script one line at a time > (perhaps just run commands at the prompt) and make sure nothing errors > out or comes up with null data where it shouldn't. Though honestly > my problems were probably related to not using the prescribed > bootloader =) > > Cheers, > -scott > > > > > On 0, "Cumberland, Lonnie" <lon...@ni...> wrote: > > Hello All, > > > > I just found the "Node Hang at boot" over at: > > > > http://wiki.openssi.org/go/Debian > > > > which describes part of my problem but even though I tried it as well > as the solution in the "Node hang at boot, variant 2", I still get the > same message. > > > > What's strange is that the e1000 driver is included in the ramdisk > image as I mounted it to take a look. > > > > I also, added the /sbin/dhclient to the /etc/mkinitrd/exe before I > used "ssi-ksync" to rebuild the ramdisk image. > > > > No luck so far on getting Node 2 to complete the booting even though > it receives the kernel from the tftpd server on Node 1. > > > > Any ideas? > > > > > > Thanks and have a great day, > > Lonnie Cumberland, Prof. > > Physicist > > > > National Institute of Standards and Technology > > Ionizing Radiation Division (846) > > Radiation Physics Group (245), Room C106 > > ADDRESS: > > 100 Bureau Drive, Stop 8462 > > Gaithersburg, MD 20899-8462 > > > > EMAIL: lon...@ni... > > http://physics.nist.gov/Divisions/Div846/div846.html > > > > > > > > > -----Original Message----- > > > From: Cumberland, Lonnie [mailto:lon...@ni...] > > > Sent: Thursday, August 05, 2010 8:42 AM > > > To: John Hughes > > > Cc: Openssi users > > > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 > kernel] > > > from source > > > > > > Greetings All, > > > > > > I just tried to use "ssi-ksync" on the main node to sync up things > for > > > the ramimage that is being sent over via PXE boot to the nodes, but > it > > > seems that the dhcp client is missing from the ram image. > > > > > > The message that I am getting on node 2 is: > > > > > > "Gathering Cluster info > > > > > > DHCP client application not found > > > > > > Add dhcp client application to /etc/mkinitrd/exe and rebuild > ramdisk > > > image > > > > > > ERROR: Could not find a NIC with node configuration. Halting." > > > > > > I think that I need to edit the ramdisk so that it has the dhck > client > > > application, but am not sure how to do that. > > > > > > Any advice would be greatly appreciated. > > > > > > Thanks and have a great day, > > > Lonnie Cumberland, Prof. > > > Physicist > > > > > > National Institute of Standards and Technology > > > Ionizing Radiation Division (846) > > > Radiation Physics Group (245), Room C106 > > > ADDRESS: > > > 100 Bureau Drive, Stop 8462 > > > Gaithersburg, MD 20899-8462 > > > > > > EMAIL: lon...@ni... > > > http://physics.nist.gov/Divisions/Div846/div846.html > > > > > > > > > > -----Original Message----- > > > > From: John Hughes [mailto:jo...@Ca...] > > > > Sent: Thursday, August 05, 2010 5:19 AM > > > > To: Cumberland, Lonnie > > > > Cc: Scott Walters; Openssi users > > > > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 > kernel] > > > > from source > > > > > > > > Cumberland, Lonnie wrote: > > > > > I have to ask what exact procedure you or others have used and > if > > > you > > > > have the log files from an install that works since I am at a > loss as > > > > to why this procedure is not working as it seems like it should. > > > > > > > > > The log files are the exact procedure I followed. > > > > > > > > I'm sorry that I haven't been able to find any time to work on > this > > > for > > > > the moment, I'll try again this weekend. > > > > > > > > (Current tasks, not necessarily in priority order: > > > > > > > > Port a huge application from SCO UnixWare to Debian Linux > > > > Finish remodeling daughters bedroom > > > > Replace broken fridge > > > > Clean up mess left after cutting down tree > > > > Work on OpenSSI > > > > Hobby project - Porting software to obsolete ICL mainframe > computer. > > > > (I > > > > don't have enough space for a model railway set)). > > > > > > > > > ------------------------------------------------------------------- > ---- > > > ------- > > > The Palm PDK Hot Apps Program offers developers who use the > > > Plug-In Development Kit to bring their C/C++ apps to Palm for a > share > > > of $1 Million in cash or HP Products. Visit us here for more > details: > > > http://p.sf.net/sfu/dev2dev-palm > > > _______________________________________________ > > > Ssic-linux-users mailing list > > > Ssi...@li... > > > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users > > > > --------------------------------------------------------------------- > --------- > > The Palm PDK Hot Apps Program offers developers who use the > > Plug-In Development Kit to bring their C/C++ apps to Palm for a share > > of $1 Million in cash or HP Products. Visit us here for more details: > > http://p.sf.net/sfu/dev2dev-palm > > _______________________________________________ > > Ssic-linux-users mailing list > > Ssi...@li... > > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users |
|
From: Cumberland, L. <lon...@ni...> - 2010-08-05 15:16:27
|
Hello All, I just found the "Node Hang at boot" over at: http://wiki.openssi.org/go/Debian which describes part of my problem but even though I tried it as well as the solution in the "Node hang at boot, variant 2", I still get the same message. What's strange is that the e1000 driver is included in the ramdisk image as I mounted it to take a look. I also, added the /sbin/dhclient to the /etc/mkinitrd/exe before I used "ssi-ksync" to rebuild the ramdisk image. No luck so far on getting Node 2 to complete the booting even though it receives the kernel from the tftpd server on Node 1. Any ideas? Thanks and have a great day, Lonnie Cumberland, Prof. Physicist National Institute of Standards and Technology Ionizing Radiation Division (846) Radiation Physics Group (245), Room C106 ADDRESS: 100 Bureau Drive, Stop 8462 Gaithersburg, MD 20899-8462 EMAIL: lon...@ni... http://physics.nist.gov/Divisions/Div846/div846.html > -----Original Message----- > From: Cumberland, Lonnie [mailto:lon...@ni...] > Sent: Thursday, August 05, 2010 8:42 AM > To: John Hughes > Cc: Openssi users > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 kernel] > from source > > Greetings All, > > I just tried to use "ssi-ksync" on the main node to sync up things for > the ramimage that is being sent over via PXE boot to the nodes, but it > seems that the dhcp client is missing from the ram image. > > The message that I am getting on node 2 is: > > "Gathering Cluster info > > DHCP client application not found > > Add dhcp client application to /etc/mkinitrd/exe and rebuild ramdisk > image > > ERROR: Could not find a NIC with node configuration. Halting." > > I think that I need to edit the ramdisk so that it has the dhck client > application, but am not sure how to do that. > > Any advice would be greatly appreciated. > > Thanks and have a great day, > Lonnie Cumberland, Prof. > Physicist > > National Institute of Standards and Technology > Ionizing Radiation Division (846) > Radiation Physics Group (245), Room C106 > ADDRESS: > 100 Bureau Drive, Stop 8462 > Gaithersburg, MD 20899-8462 > > EMAIL: lon...@ni... > http://physics.nist.gov/Divisions/Div846/div846.html > > > > -----Original Message----- > > From: John Hughes [mailto:jo...@Ca...] > > Sent: Thursday, August 05, 2010 5:19 AM > > To: Cumberland, Lonnie > > Cc: Scott Walters; Openssi users > > Subject: Re: [SSI-users] trying to build OpenSSI [with 2.6.14 kernel] > > from source > > > > Cumberland, Lonnie wrote: > > > I have to ask what exact procedure you or others have used and if > you > > have the log files from an install that works since I am at a loss as > > to why this procedure is not working as it seems like it should. > > > > > The log files are the exact procedure I followed. > > > > I'm sorry that I haven't been able to find any time to work on this > for > > the moment, I'll try again this weekend. > > > > (Current tasks, not necessarily in priority order: > > > > Port a huge application from SCO UnixWare to Debian Linux > > Finish remodeling daughters bedroom > > Replace broken fridge > > Clean up mess left after cutting down tree > > Work on OpenSSI > > Hobby project - Porting software to obsolete ICL mainframe computer. > > (I > > don't have enough space for a model railway set)). > > > ----------------------------------------------------------------------- > ------- > The Palm PDK Hot Apps Program offers developers who use the > Plug-In Development Kit to bring their C/C++ apps to Palm for a share > of $1 Million in cash or HP Products. Visit us here for more details: > http://p.sf.net/sfu/dev2dev-palm > _______________________________________________ > Ssic-linux-users mailing list > Ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users |