You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(27) |
Aug
(59) |
Sep
(61) |
Oct
(59) |
Nov
(46) |
Dec
(75) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(70) |
Feb
(106) |
Mar
(76) |
Apr
(87) |
May
(85) |
Jun
(125) |
Jul
(334) |
Aug
(218) |
Sep
(166) |
Oct
(228) |
Nov
(143) |
Dec
(130) |
2003 |
Jan
(181) |
Feb
(170) |
Mar
(99) |
Apr
(134) |
May
(131) |
Jun
(107) |
Jul
(162) |
Aug
(152) |
Sep
(225) |
Oct
(210) |
Nov
(162) |
Dec
(172) |
2004 |
Jan
(93) |
Feb
(207) |
Mar
(86) |
Apr
(115) |
May
(60) |
Jun
(103) |
Jul
(68) |
Aug
(31) |
Sep
(61) |
Oct
(88) |
Nov
(41) |
Dec
(54) |
2005 |
Jan
(243) |
Feb
(146) |
Mar
(166) |
Apr
(80) |
May
(79) |
Jun
(59) |
Jul
(76) |
Aug
(29) |
Sep
(31) |
Oct
(11) |
Nov
(42) |
Dec
(48) |
2006 |
Jan
(65) |
Feb
(82) |
Mar
(97) |
Apr
(99) |
May
(53) |
Jun
(122) |
Jul
(88) |
Aug
(59) |
Sep
(25) |
Oct
(62) |
Nov
(90) |
Dec
(61) |
2007 |
Jan
(67) |
Feb
(102) |
Mar
(104) |
Apr
(118) |
May
(65) |
Jun
(48) |
Jul
(30) |
Aug
(18) |
Sep
(39) |
Oct
(40) |
Nov
(43) |
Dec
(25) |
2008 |
Jan
(12) |
Feb
(63) |
Mar
(34) |
Apr
(46) |
May
(50) |
Jun
(29) |
Jul
(115) |
Aug
(83) |
Sep
(42) |
Oct
(73) |
Nov
(130) |
Dec
(94) |
2009 |
Jan
(101) |
Feb
(76) |
Mar
(161) |
Apr
(46) |
May
(27) |
Jun
(70) |
Jul
(135) |
Aug
(43) |
Sep
(51) |
Oct
(55) |
Nov
(71) |
Dec
(17) |
2010 |
Jan
(9) |
Feb
(15) |
Mar
(6) |
Apr
(40) |
May
(12) |
Jun
(9) |
Jul
(35) |
Aug
(23) |
Sep
(24) |
Oct
(5) |
Nov
|
Dec
(1) |
2011 |
Jan
(19) |
Feb
(137) |
Mar
(27) |
Apr
(6) |
May
(7) |
Jun
(13) |
Jul
(9) |
Aug
(13) |
Sep
(12) |
Oct
(2) |
Nov
(4) |
Dec
(5) |
2012 |
Jan
(1) |
Feb
(1) |
Mar
(2) |
Apr
(4) |
May
(7) |
Jun
(27) |
Jul
(9) |
Aug
(4) |
Sep
(1) |
Oct
|
Nov
(7) |
Dec
(12) |
2013 |
Jan
(3) |
Feb
(70) |
Mar
(64) |
Apr
(4) |
May
|
Jun
(8) |
Jul
(1) |
Aug
(14) |
Sep
(15) |
Oct
|
Nov
(11) |
Dec
(1) |
2014 |
Jan
(8) |
Feb
(3) |
Mar
|
Apr
|
May
(2) |
Jun
(8) |
Jul
(24) |
Aug
(3) |
Sep
(1) |
Oct
|
Nov
(11) |
Dec
(11) |
2015 |
Jan
(17) |
Feb
(12) |
Mar
(2) |
Apr
(3) |
May
|
Jun
(1) |
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(4) |
Jun
(3) |
Jul
|
Aug
(5) |
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: LAHAYE O. <oli...@ce...> - 2023-02-10 09:28:42
|
Hi everyone, The IT team lend me a MacbookPro M1pro for a week and I gave a look as porting OSCAR to arm64v8 architecture (using a docker container). And guess what? After a few tweeks, it builds perfectly 😊 I’ve built package for AlmaLinux 8 aarch64 Packages are available here: http://olivier.lahaye1.free.fr/OSCAR/repos/unstable/rhel-8-aarch64/ If you’re interested in giving a try, feel free to report your experience. (Install is similar to x86_64 except the network boot). Keep in mind that this is work in progress and alpha quality. Cheers, Olivier. -- Olivier LAHAYE |
From: LAHAYE O. <oli...@ce...> - 2022-06-23 20:50:49
|
Hi, Small notice to information that OSCAR Cluster is being ported to modern distros. Last issues will soon be fixed. Debian 10, 11 and RHEL 8 and 9: Oscar-selector needs rewrite as it requires perl-Qt4 which is obsolete and incompatible with current Qt versions. It’ll be soon replaced with GUIDeFATE. RHEL-9: python3-twisted is not yet available for this distro, thus apitest is not installable thus Oscar can’t be instgalled. Once available, issue will be fixed. The quick start guide available here is not yet up to date: https://oscar-cluster.github.io/oscar/wiki/quick_start_guide_for_rhel The official repository is hosted here for now: http://olivier.lahaye1.free.fr/OSCAR/repos/unstable/ Cheers, Olivier. -- Olivier LAHAYE |
From: adli h. <adl...@ho...> - 2018-09-28 11:07:22
|
Dear, Attached is the oscar-config --bootstrap log and /etc/hosts file |
From: LAHAYE O. <oli...@ce...> - 2018-04-19 14:56:04
|
Dear OSCAR cluster users, Great progress has been made to OSCAR lately. 1st of all, we have now a true robust modern systemimager that works on CentOS-6, CentOS-7, Fedora-27 and OpenSuSE-42.3 (should work on debian, but the packaging is not yet ported to new systemimager. it needs full rewrite) See it in action here: http://olivier.lahaye1.free.fr/SystemImager/Videos/20180403_SystemImager_directboot.webm The great feature is that it can install and run the OS without rebooting after install (requirement is that the initramfs kernel version has its modules on imaged system). This reduce a lot install time (especially on system that have tons of RAM to count) This systemimager only supports rsync protocol for the moment but many more are planed (scp, nfs, bittorrent, http, ...) No more post-install script needed to install the boot-loader. The main install script is optional now. The disk layout for systemimager has changed. Now, main install scripts are optional nad must be moved into /var/lib/systeimager/scripts/main-install The /var/lib/systeimager/images/<image_name>/etc/systemimager/autoinstallscript.conf must be moved to /var/lib/systeimager/scripts/disks-layouts/<image_name>.xml More info to come in a quick start guide. Build in progress for CentOS-7 and Fedora-27. (CentOS-6 already built). For those who want to build their own OSCAR rpms, docker files can be found here:https://github.com/oscar-cluster/oscar/tree/master/support_files This is an excellent way to familiarise with oscar-packager, the main build tool with opkgc. Once running the resulting docker image, you'll find main oscar packages in /tftpboot. to build remaining packages, just type in : oscar-packager --all unstable (or oscar-packager --debug --all unstable) THIS RELEASE IS FOR TESTING PURPOSE Here are the testing repositories. Just install oscar-release package and you have access to it. Note: if URL is invalid, just got to http://svn.oscar.openclustergroup.org/repos/unstable<http://svn.oscar.openclustergroup.org/repos/unstable/rhel-6-x86_64/oscar-release-6.1.3-0.20180416.el6.noarch.rpm> the select the distro and choose the new oscar-release rpm. CentOS-6: yum install http://svn.oscar.openclustergroup.org/repos/unstable/rhel-6-x86_64/oscar-release-6.1.3-0.20180416.el6.noarch.rpm CentOS-7: yum install http://svn.oscar.openclustergroup.org/repos/unstable/rhel-7-x86_64/oscar-release-6.1.3-0.20180416.el7.centos.noarch.rpm Fedora-27: dnf -y install http://svn.oscar.openclustergroup.org/repos/unstable/fc-27-x86_64/oscar-release-6.1.3-0.20180416.fc27.noarch.rpm Happy testing. -- Olivier LAHAYE CEA DRT/LIST/DIR |
From: Parag K. <pa...@ci...> - 2016-08-18 11:32:25
|
Hi, Can anyone tell that does OSCAR support graphical interface of Cluster Management ? Regards, Parag +91 8308806004 |
From: LAHAYE O. <oli...@ce...> - 2016-08-02 12:48:50
|
Docker is a really cool stuff. It often (not all the time) replace the need for a true VM, and in our case is even more convenient way to build OSCAR. The video training courses are really great. Hope you can play around with this tools and OSCAR. Last note: oscar-packager may report some build failures. It's not dramatic until it's a non mandatory package. system-imager is a specific case. It'll fail. It's because our version is a hacked version that has been built handmade (fails to build normaly). In your case, you'll have to get a 32 bit version on the systemimager site or use a 32bit package in old oscar repos. systemimager should evolve in a near future and be buildable again. It's now part of xCAT and Brian Finley it author is working again on it. I'm also working a little bit on it on my spare time. (but right now, it's broken). Finaly, -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________________ De : Steven Evans [steven.evans@BoolEngine.com] Envoyé : mardi 2 août 2016 14:35 À : osc...@li... Objet : Re: [Oscar-users] Building a Beowulf Cluster with Obsolete Hardware Thanks so much for the helpful information, Olivier. I had never even heard of docker, until you mentioned it. I'll definitely have to check it out. On 08/02/2016 06:04 AM, LAHAYE Olivier wrote: > Hi Steven, > > To be honest, I can't tell if older versions of OSCAR still work as I did only work on oscar 6 on CentOS-6 and CentOS-7. > 32 bit hardware is pretty old and I doubt you can find a mantained linux distro the runs in 32bit mode. > At the time being, only the testing version is "maintained" and unfortunately, it's only built on x86_64 hardware (http://svn.oscar.openclustergroup.org/repos/unstable/). > If you're familiar with docker, you can try to build a 32 bit version (not tested though) for your own purpose. > The docker files are available here: http://svn.oscar.openclustergroup.org/pkgs/downloads/docker/ > The quick start guide is available here: http://svn.oscar.openclustergroup.org/trac/oscar/wiki/quick_start_guide_for_rhel > But before starting the guide, you should build the 32 bits packages and setup a repository on your network. > Then in the step 4 of the guide, you'll have to update the epel package. for OSCAR, you can install the noarch oscar-release package and there after edit the /etc/yum.repos/d/OSCAR.repo to point to your repository. > As your hardware is "weak", I would suggest to install a minimalistic cluster (no ganglia, naemon and other stuffs. only the basics and the batch queuing system (I would recommand slurm) > > how to use docker to build oscar packages: > HOWTO use it: > Chose wich distro you want to play with and then: > > 0/ Install docker on your linux host. > yum -y install docker > dnf -y install docker > urpmi --auto docker > apt-get -y install docker > or what ever is suitable for your distro. > > 1/ Build the docker image: > sudo docker build -t <yourname>/oscar_<distro>:<version> -f Dockerfile.<distro> . > example: > sudo docker build -t john/oscar_co7:1.0 -f Dockerfile.centos7 . > > 2/ Run interactively the docker image you've built: > sudo docker run -it <yourname>/oscar_<distro>:<version> > example: > sudo docker run -it john/oscar_co7:1.0 > > 3/ Now that you're running your image, start playing with oscar-packager. > oscar-packager --all oda > > => Result will end into /tftpboot/oscar/<oscar-distro-tag>/ > > To build all packages: > oscar-packager --all unstable > you can add --verbose or even --debug for more output. > > Now you can stop/quit your docker image by exitting the bash (exit / ^D) > 4/ Once done playing, you can choose to keep your track of the state of the docker image you've just quit. > To do so: > sudo docker ps -a > Note the container id that you want to keep track (the most recent) > sudo docker commit <contained_id> john/oscar_co7:1.1 > > Note that the version has increased. This is to avoid overriding the 1.0 version. Though, you could have used 1.0 to store the current image status if you don't mind keeping the 1.0 version. > > More infos on how to use docker here: > https://training.docker.com/self-paced-training > > cheers, > > Olivier. > > -- > Olivier LAHAYE > > ________________________________________ > De : Steven Evans [steven.evans@BoolEngine.com] > Envoyé : lundi 1 août 2016 19:20 > À : osc...@li... > Objet : [Oscar-users] Building a Beowulf Cluster with Obsolete Hardware > > Oscar Users, > I'm interested in building a Beowulf cluster by using a bunch of > old IBM desktops that I own. The machines are far too old to meet the > minimum system requirements of the latest version of Oscar, so I'm > trying to use an older version that fits the generation of my computers > (Oscar 4.2). Is this a feasible endeavor, or is Oscar not suitable for > something like this? > As you know, using an older version of Oscar also means using an > old/obsolete operating system. I suspect that support for both will be > nonexistent, so I'm interested in getting the advice of those who have > built clusters. Is this something that lots of people do (build a > cluster with obsolete software), or is it rare? > These computers have Pentium II processors with clock speeds of > around 350 MHz. The maximum RAM for each machine is 768 MB. Most of my > hard drives are 4.2 GB IDE drives, although I have a few 6 GB drives and > one that's 40 GB. > I've never built a cluster, but have noticed that most books on the > subject are at least 10 years old. They also rarely give enough detail > to actually build a cluster. That's why I've started looking at Oscar. > I'm mostly trying to find out whether or not this is practical > before potentially pursuing a fool's errand. Any advice from > experienced cluster builders is greatly appreciated. > > Thanks for your attention, > > Steven Evans > > -- > http://www.BoolEngine.com > A Tool for the Technical Community > > ------------------------------------------------------------------------------ > _______________________________________________ > Oscar-users mailing list > Osc...@li... > https://lists.sourceforge.net/lists/listinfo/oscar-users > > ------------------------------------------------------------------------------ > _______________________________________________ > Oscar-users mailing list > Osc...@li... > https://lists.sourceforge.net/lists/listinfo/oscar-users > -- http://www.BoolEngine.com A Tool for the Technical Community ------------------------------------------------------------------------------ _______________________________________________ Oscar-users mailing list Osc...@li... https://lists.sourceforge.net/lists/listinfo/oscar-users |
From: Steven E. <steven.evans@BoolEngine.com> - 2016-08-02 12:36:05
|
Thanks so much for the helpful information, Olivier. I had never even heard of docker, until you mentioned it. I'll definitely have to check it out. On 08/02/2016 06:04 AM, LAHAYE Olivier wrote: > Hi Steven, > > To be honest, I can't tell if older versions of OSCAR still work as I did only work on oscar 6 on CentOS-6 and CentOS-7. > 32 bit hardware is pretty old and I doubt you can find a mantained linux distro the runs in 32bit mode. > At the time being, only the testing version is "maintained" and unfortunately, it's only built on x86_64 hardware (http://svn.oscar.openclustergroup.org/repos/unstable/). > If you're familiar with docker, you can try to build a 32 bit version (not tested though) for your own purpose. > The docker files are available here: http://svn.oscar.openclustergroup.org/pkgs/downloads/docker/ > The quick start guide is available here: http://svn.oscar.openclustergroup.org/trac/oscar/wiki/quick_start_guide_for_rhel > But before starting the guide, you should build the 32 bits packages and setup a repository on your network. > Then in the step 4 of the guide, you'll have to update the epel package. for OSCAR, you can install the noarch oscar-release package and there after edit the /etc/yum.repos/d/OSCAR.repo to point to your repository. > As your hardware is "weak", I would suggest to install a minimalistic cluster (no ganglia, naemon and other stuffs. only the basics and the batch queuing system (I would recommand slurm) > > how to use docker to build oscar packages: > HOWTO use it: > Chose wich distro you want to play with and then: > > 0/ Install docker on your linux host. > yum -y install docker > dnf -y install docker > urpmi --auto docker > apt-get -y install docker > or what ever is suitable for your distro. > > 1/ Build the docker image: > sudo docker build -t <yourname>/oscar_<distro>:<version> -f Dockerfile.<distro> . > example: > sudo docker build -t john/oscar_co7:1.0 -f Dockerfile.centos7 . > > 2/ Run interactively the docker image you've built: > sudo docker run -it <yourname>/oscar_<distro>:<version> > example: > sudo docker run -it john/oscar_co7:1.0 > > 3/ Now that you're running your image, start playing with oscar-packager. > oscar-packager --all oda > > => Result will end into /tftpboot/oscar/<oscar-distro-tag>/ > > To build all packages: > oscar-packager --all unstable > you can add --verbose or even --debug for more output. > > Now you can stop/quit your docker image by exitting the bash (exit / ^D) > 4/ Once done playing, you can choose to keep your track of the state of the docker image you've just quit. > To do so: > sudo docker ps -a > Note the container id that you want to keep track (the most recent) > sudo docker commit <contained_id> john/oscar_co7:1.1 > > Note that the version has increased. This is to avoid overriding the 1.0 version. Though, you could have used 1.0 to store the current image status if you don't mind keeping the 1.0 version. > > More infos on how to use docker here: > https://training.docker.com/self-paced-training > > cheers, > > Olivier. > > -- > Olivier LAHAYE > > ________________________________________ > De : Steven Evans [steven.evans@BoolEngine.com] > Envoyé : lundi 1 août 2016 19:20 > À : osc...@li... > Objet : [Oscar-users] Building a Beowulf Cluster with Obsolete Hardware > > Oscar Users, > I'm interested in building a Beowulf cluster by using a bunch of > old IBM desktops that I own. The machines are far too old to meet the > minimum system requirements of the latest version of Oscar, so I'm > trying to use an older version that fits the generation of my computers > (Oscar 4.2). Is this a feasible endeavor, or is Oscar not suitable for > something like this? > As you know, using an older version of Oscar also means using an > old/obsolete operating system. I suspect that support for both will be > nonexistent, so I'm interested in getting the advice of those who have > built clusters. Is this something that lots of people do (build a > cluster with obsolete software), or is it rare? > These computers have Pentium II processors with clock speeds of > around 350 MHz. The maximum RAM for each machine is 768 MB. Most of my > hard drives are 4.2 GB IDE drives, although I have a few 6 GB drives and > one that's 40 GB. > I've never built a cluster, but have noticed that most books on the > subject are at least 10 years old. They also rarely give enough detail > to actually build a cluster. That's why I've started looking at Oscar. > I'm mostly trying to find out whether or not this is practical > before potentially pursuing a fool's errand. Any advice from > experienced cluster builders is greatly appreciated. > > Thanks for your attention, > > Steven Evans > > -- > http://www.BoolEngine.com > A Tool for the Technical Community > > ------------------------------------------------------------------------------ > _______________________________________________ > Oscar-users mailing list > Osc...@li... > https://lists.sourceforge.net/lists/listinfo/oscar-users > > ------------------------------------------------------------------------------ > _______________________________________________ > Oscar-users mailing list > Osc...@li... > https://lists.sourceforge.net/lists/listinfo/oscar-users > -- http://www.BoolEngine.com A Tool for the Technical Community |
From: LAHAYE O. <oli...@ce...> - 2016-08-02 10:04:39
|
Hi Steven, To be honest, I can't tell if older versions of OSCAR still work as I did only work on oscar 6 on CentOS-6 and CentOS-7. 32 bit hardware is pretty old and I doubt you can find a mantained linux distro the runs in 32bit mode. At the time being, only the testing version is "maintained" and unfortunately, it's only built on x86_64 hardware (http://svn.oscar.openclustergroup.org/repos/unstable/). If you're familiar with docker, you can try to build a 32 bit version (not tested though) for your own purpose. The docker files are available here: http://svn.oscar.openclustergroup.org/pkgs/downloads/docker/ The quick start guide is available here: http://svn.oscar.openclustergroup.org/trac/oscar/wiki/quick_start_guide_for_rhel But before starting the guide, you should build the 32 bits packages and setup a repository on your network. Then in the step 4 of the guide, you'll have to update the epel package. for OSCAR, you can install the noarch oscar-release package and there after edit the /etc/yum.repos/d/OSCAR.repo to point to your repository. As your hardware is "weak", I would suggest to install a minimalistic cluster (no ganglia, naemon and other stuffs. only the basics and the batch queuing system (I would recommand slurm) how to use docker to build oscar packages: HOWTO use it: Chose wich distro you want to play with and then: 0/ Install docker on your linux host. yum -y install docker dnf -y install docker urpmi --auto docker apt-get -y install docker or what ever is suitable for your distro. 1/ Build the docker image: sudo docker build -t <yourname>/oscar_<distro>:<version> -f Dockerfile.<distro> . example: sudo docker build -t john/oscar_co7:1.0 -f Dockerfile.centos7 . 2/ Run interactively the docker image you've built: sudo docker run -it <yourname>/oscar_<distro>:<version> example: sudo docker run -it john/oscar_co7:1.0 3/ Now that you're running your image, start playing with oscar-packager. oscar-packager --all oda => Result will end into /tftpboot/oscar/<oscar-distro-tag>/ To build all packages: oscar-packager --all unstable you can add --verbose or even --debug for more output. Now you can stop/quit your docker image by exitting the bash (exit / ^D) 4/ Once done playing, you can choose to keep your track of the state of the docker image you've just quit. To do so: sudo docker ps -a Note the container id that you want to keep track (the most recent) sudo docker commit <contained_id> john/oscar_co7:1.1 Note that the version has increased. This is to avoid overriding the 1.0 version. Though, you could have used 1.0 to store the current image status if you don't mind keeping the 1.0 version. More infos on how to use docker here: https://training.docker.com/self-paced-training cheers, Olivier. -- Olivier LAHAYE ________________________________________ De : Steven Evans [steven.evans@BoolEngine.com] Envoyé : lundi 1 août 2016 19:20 À : osc...@li... Objet : [Oscar-users] Building a Beowulf Cluster with Obsolete Hardware Oscar Users, I'm interested in building a Beowulf cluster by using a bunch of old IBM desktops that I own. The machines are far too old to meet the minimum system requirements of the latest version of Oscar, so I'm trying to use an older version that fits the generation of my computers (Oscar 4.2). Is this a feasible endeavor, or is Oscar not suitable for something like this? As you know, using an older version of Oscar also means using an old/obsolete operating system. I suspect that support for both will be nonexistent, so I'm interested in getting the advice of those who have built clusters. Is this something that lots of people do (build a cluster with obsolete software), or is it rare? These computers have Pentium II processors with clock speeds of around 350 MHz. The maximum RAM for each machine is 768 MB. Most of my hard drives are 4.2 GB IDE drives, although I have a few 6 GB drives and one that's 40 GB. I've never built a cluster, but have noticed that most books on the subject are at least 10 years old. They also rarely give enough detail to actually build a cluster. That's why I've started looking at Oscar. I'm mostly trying to find out whether or not this is practical before potentially pursuing a fool's errand. Any advice from experienced cluster builders is greatly appreciated. Thanks for your attention, Steven Evans -- http://www.BoolEngine.com A Tool for the Technical Community ------------------------------------------------------------------------------ _______________________________________________ Oscar-users mailing list Osc...@li... https://lists.sourceforge.net/lists/listinfo/oscar-users |
From: Steven E. <steven.evans@BoolEngine.com> - 2016-08-01 17:33:27
|
Oscar Users, I'm interested in building a Beowulf cluster by using a bunch of old IBM desktops that I own. The machines are far too old to meet the minimum system requirements of the latest version of Oscar, so I'm trying to use an older version that fits the generation of my computers (Oscar 4.2). Is this a feasible endeavor, or is Oscar not suitable for something like this? As you know, using an older version of Oscar also means using an old/obsolete operating system. I suspect that support for both will be nonexistent, so I'm interested in getting the advice of those who have built clusters. Is this something that lots of people do (build a cluster with obsolete software), or is it rare? These computers have Pentium II processors with clock speeds of around 350 MHz. The maximum RAM for each machine is 768 MB. Most of my hard drives are 4.2 GB IDE drives, although I have a few 6 GB drives and one that's 40 GB. I've never built a cluster, but have noticed that most books on the subject are at least 10 years old. They also rarely give enough detail to actually build a cluster. That's why I've started looking at Oscar. I'm mostly trying to find out whether or not this is practical before potentially pursuing a fool's errand. Any advice from experienced cluster builders is greatly appreciated. Thanks for your attention, Steven Evans -- http://www.BoolEngine.com A Tool for the Technical Community |
From: Richard Y. <Ric...@us...> - 2016-06-02 05:18:26
|
DongInn Thanks, I have gone through the configuration of both and it seems there was a miss configuration in the queue settings, in particular walltime, that was stopping jobs from running. Thanks again. --------------------------------------------------------------------- Richard A. Young ICT Services Email: Ric...@us... Phone: (07) 46315557 Mob: 0437544370 Fax: (07) 46312798 --------------------------------------------------------------------- -----Original Message----- From: Kim, DongInn [mailto:di...@in...] Sent: Wednesday, 1 June 2016 12:51 PM To: osc...@li... Subject: Re: [Oscar-users] Jobs not running on reconfigured cluster Hi Richard, I think that this is a torque+maui configuration issue on your cluster. Can you please make sure that your configurations of torque and maui are setup properly? I hope that you can find the torque and maui admin manual on google. One thing that I would like to play with is to see what log messages are generated on the server and client sides when a new job is submitted. That would show many hints on your problem. Regards, -- - DongInn > On May 31, 2016, at 8:40 PM, Richard Young <Ric...@us...> wrote: > > Lahaye > - No I can't see the ganglia web interface on either the public or private interfaces, it says "you have no permission" > - the admin node is setup as a forwarding dns server and lookups seem to work correctly > - the firewall/iptables services have been stopped, with on an iptables rule set from the command line to forward and NAT traffic > - nscd cache has been turned off > - munge is running > - torque/maui packages did get updated, configurations have been check to make certain they were the same as before the update. > > Thanks > > --------------------------------------------------------------------- > Richard A. Young > ICT Services > Email: Ric...@us... Phone: (07) 46315557 > Mob: 0437544370 Fax: (07) 46312798 > --------------------------------------------------------------------- > > -----Original Message----- > From: LAHAYE Olivier [mailto:oli...@ce...] > Sent: Tuesday, 31 May 2016 6:12 PM > To: osc...@li... > Subject: Re: [Oscar-users] Jobs not running on reconfigured cluster > > Hi Richard, > > - Can you see ganglia web interface? > - Are you using a DNS for your cluster? > - Are firewalld / iptables services stopped? > - Is nscd cache reseted? > - is munge running? > - I'm not using torque/maui anymore, so I can't check on my side to see if there are some specific config to check... > - were the torque / maui package got updated during the process? > > Olivier. > -- > Olivier LAHAYE > CEA DRT/LIST/DIR > > ________________________________________ > De : Richard Young [Ric...@us...] Envoyé : mardi 31 mai 2016 06:29 À : 'osc...@li...' > Objet : Re: [Oscar-users] Jobs not running on reconfigured cluster > > DongInn > Did check these before but I re-checked as below: > 1. /etc/hosts are the same across the cluster. > 2. can ssh to a node and back without any problems or password. The known_hosts file has been updated and copied across the cluster. > 3. checked nagios/nrpe and it is setup to allow the admin node to collect details. > 4. ganglia/gmond is setup to talk to the admin node. > 5. pbs_server and maui on the admin have been restarted with no reported errors in the log files. > 6. pbs_mom on the nodes has been restarted with no reported errors in the log files. > 7. a search through /etc and /var/lib/torque for the ip-address of the server doesn't find anything other old log entries. > 8. /etc/dhcp/dhcpd.conf has been updated. > 9. /etc/ntp.conf has been updated across the cluster. > > Thanks > > --------------------------------------------------------------------- > Richard A. Young > ICT Services > Email: Ric...@us... Phone: (07) 46315557 > Mob: 0437544370 Fax: (07) 46312798 > --------------------------------------------------------------------- > > -----Original Message----- > From: Kim, DongInn [mailto:di...@in...] > Sent: Tuesday, 31 May 2016 12:05 PM > To: Users OSCAR > Subject: Re: [Oscar-users] Jobs not running on reconfigured cluster > > Hi Richard, > > I would like to double check the following items if I were you. > > 1. /etc/hosts, ssh keys, nagios/nrpe, gmetad/gmond are all synced through all the nodes. > 2. Make sure that the root user can ssh into all the nodes back and forth without password. > 3. All the daemons of the job submission are running on all the nodes: > (torque-server, torque-mom in the head node and torque-mom in the client nodes and maui on the head node) > I assume that you are using torque as RM and maui as a scheduler. > > Regards, > > -- > - DongInn > > > >> On May 30, 2016, at 7:25 PM, Richard Young <Ric...@us...> wrote: >> >> I was hoping somebody would be able to help me with the following problem. >> >> Recently I have applied updates and done some reconfiguration on a RHEL6.8 cluster running Oscar. The major change was changing the ipaddress of the oscar_server, this was required because changes to the network structure. The ipaddress has been applied to /etc/hosts, ssh keys, nagios/nrpe, gmetad/gmond etc. However, I have missed something because no jobs will now run on the cluster. The jobs basically site in the queue and then get cancelled because they have hit their walltime. >> >> Has anybody come across this problem before and be able to supply some insight into how to fix the problem(s). >> >> Thanks >> >> --------------------------------------------------------------------- >> Richard A. Young >> ICT Services >> HPC Systems Engineer >> University of Southern Queensland >> Toowoomba, Queensland 4350 >> Australia >> Email: Ric...@us... Phone: (07) 46315557 >> Mob: 0437544370 Fax: (07) 46312798 >> --------------------------------------------------------------------- >> >> >> >> _____________________________________________________________ >> This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. >> >> The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. >> >> The University of Southern Queensland is a registered provider of education with the Australian Government. >> (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) >> >> >> ---------------------------------------------------------------------- >> -------- What NetFlow Analyzer can do for you? Monitors network >> bandwidth and traffic patterns at an interface-level. Reveals which >> users, apps, and protocols are consuming the most bandwidth. Provides >> multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make >> informed decisions using capacity planning reports. >> https://ad.doubleclick.net/ddm/clk/305295220;132659582;e >> _______________________________________________ >> Oscar-users mailing list >> Osc...@li... >> https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > _____________________________________________________________ > This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. > > The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. > > The University of Southern Queensland is a registered provider of education with the Australian Government. > (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > Oscar-users mailing list > Osc...@li... > https://lists.sourceforge.net/lists/listinfo/oscar-users > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > Oscar-users mailing list > Osc...@li... > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > _____________________________________________________________ > This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. > > The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. > > The University of Southern Queensland is a registered provider of education with the Australian Government. > (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic > patterns at an interface-level. Reveals which users, apps, and protocols are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > Oscar-users mailing list > Osc...@li... > https://lists.sourceforge.net/lists/listinfo/oscar-users _____________________________________________________________ This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government. (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) |
From: Kim, D. <di...@in...> - 2016-06-01 02:51:03
|
Hi Richard, I think that this is a torque+maui configuration issue on your cluster. Can you please make sure that your configurations of torque and maui are setup properly? I hope that you can find the torque and maui admin manual on google. One thing that I would like to play with is to see what log messages are generated on the server and client sides when a new job is submitted. That would show many hints on your problem. Regards, -- - DongInn > On May 31, 2016, at 8:40 PM, Richard Young <Ric...@us...> wrote: > > Lahaye > - No I can't see the ganglia web interface on either the public or private interfaces, it says "you have no permission" > - the admin node is setup as a forwarding dns server and lookups seem to work correctly > - the firewall/iptables services have been stopped, with on an iptables rule set from the command line to forward and NAT traffic > - nscd cache has been turned off > - munge is running > - torque/maui packages did get updated, configurations have been check to make certain they were the same as before the update. > > Thanks > > --------------------------------------------------------------------- > Richard A. Young > ICT Services > Email: Ric...@us... Phone: (07) 46315557 > Mob: 0437544370 Fax: (07) 46312798 > --------------------------------------------------------------------- > > -----Original Message----- > From: LAHAYE Olivier [mailto:oli...@ce...] > Sent: Tuesday, 31 May 2016 6:12 PM > To: osc...@li... > Subject: Re: [Oscar-users] Jobs not running on reconfigured cluster > > Hi Richard, > > - Can you see ganglia web interface? > - Are you using a DNS for your cluster? > - Are firewalld / iptables services stopped? > - Is nscd cache reseted? > - is munge running? > - I'm not using torque/maui anymore, so I can't check on my side to see if there are some specific config to check... > - were the torque / maui package got updated during the process? > > Olivier. > -- > Olivier LAHAYE > CEA DRT/LIST/DIR > > ________________________________________ > De : Richard Young [Ric...@us...] Envoyé : mardi 31 mai 2016 06:29 À : 'osc...@li...' > Objet : Re: [Oscar-users] Jobs not running on reconfigured cluster > > DongInn > Did check these before but I re-checked as below: > 1. /etc/hosts are the same across the cluster. > 2. can ssh to a node and back without any problems or password. The known_hosts file has been updated and copied across the cluster. > 3. checked nagios/nrpe and it is setup to allow the admin node to collect details. > 4. ganglia/gmond is setup to talk to the admin node. > 5. pbs_server and maui on the admin have been restarted with no reported errors in the log files. > 6. pbs_mom on the nodes has been restarted with no reported errors in the log files. > 7. a search through /etc and /var/lib/torque for the ip-address of the server doesn't find anything other old log entries. > 8. /etc/dhcp/dhcpd.conf has been updated. > 9. /etc/ntp.conf has been updated across the cluster. > > Thanks > > --------------------------------------------------------------------- > Richard A. Young > ICT Services > Email: Ric...@us... Phone: (07) 46315557 > Mob: 0437544370 Fax: (07) 46312798 > --------------------------------------------------------------------- > > -----Original Message----- > From: Kim, DongInn [mailto:di...@in...] > Sent: Tuesday, 31 May 2016 12:05 PM > To: Users OSCAR > Subject: Re: [Oscar-users] Jobs not running on reconfigured cluster > > Hi Richard, > > I would like to double check the following items if I were you. > > 1. /etc/hosts, ssh keys, nagios/nrpe, gmetad/gmond are all synced through all the nodes. > 2. Make sure that the root user can ssh into all the nodes back and forth without password. > 3. All the daemons of the job submission are running on all the nodes: > (torque-server, torque-mom in the head node and torque-mom in the client nodes and maui on the head node) > I assume that you are using torque as RM and maui as a scheduler. > > Regards, > > -- > - DongInn > > > >> On May 30, 2016, at 7:25 PM, Richard Young <Ric...@us...> wrote: >> >> I was hoping somebody would be able to help me with the following problem. >> >> Recently I have applied updates and done some reconfiguration on a RHEL6.8 cluster running Oscar. The major change was changing the ipaddress of the oscar_server, this was required because changes to the network structure. The ipaddress has been applied to /etc/hosts, ssh keys, nagios/nrpe, gmetad/gmond etc. However, I have missed something because no jobs will now run on the cluster. The jobs basically site in the queue and then get cancelled because they have hit their walltime. >> >> Has anybody come across this problem before and be able to supply some insight into how to fix the problem(s). >> >> Thanks >> >> --------------------------------------------------------------------- >> Richard A. Young >> ICT Services >> HPC Systems Engineer >> University of Southern Queensland >> Toowoomba, Queensland 4350 >> Australia >> Email: Ric...@us... Phone: (07) 46315557 >> Mob: 0437544370 Fax: (07) 46312798 >> --------------------------------------------------------------------- >> >> >> >> _____________________________________________________________ >> This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. >> >> The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. >> >> The University of Southern Queensland is a registered provider of education with the Australian Government. >> (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) >> >> >> ---------------------------------------------------------------------- >> -------- What NetFlow Analyzer can do for you? Monitors network >> bandwidth and traffic patterns at an interface-level. Reveals which >> users, apps, and protocols are consuming the most bandwidth. Provides >> multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make >> informed decisions using capacity planning reports. >> https://ad.doubleclick.net/ddm/clk/305295220;132659582;e >> _______________________________________________ >> Oscar-users mailing list >> Osc...@li... >> https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > _____________________________________________________________ > This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. > > The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. > > The University of Southern Queensland is a registered provider of education with the Australian Government. > (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > Oscar-users mailing list > Osc...@li... > https://lists.sourceforge.net/lists/listinfo/oscar-users > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > Oscar-users mailing list > Osc...@li... > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > _____________________________________________________________ > This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. > > The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. > > The University of Southern Queensland is a registered provider of education with the Australian Government. > (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic > patterns at an interface-level. Reveals which users, apps, and protocols are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > Oscar-users mailing list > Osc...@li... > https://lists.sourceforge.net/lists/listinfo/oscar-users |
From: Richard Y. <Ric...@us...> - 2016-06-01 00:40:54
|
Lahaye - No I can't see the ganglia web interface on either the public or private interfaces, it says "you have no permission" - the admin node is setup as a forwarding dns server and lookups seem to work correctly - the firewall/iptables services have been stopped, with on an iptables rule set from the command line to forward and NAT traffic - nscd cache has been turned off - munge is running - torque/maui packages did get updated, configurations have been check to make certain they were the same as before the update. Thanks --------------------------------------------------------------------- Richard A. Young ICT Services Email: Ric...@us... Phone: (07) 46315557 Mob: 0437544370 Fax: (07) 46312798 --------------------------------------------------------------------- -----Original Message----- From: LAHAYE Olivier [mailto:oli...@ce...] Sent: Tuesday, 31 May 2016 6:12 PM To: osc...@li... Subject: Re: [Oscar-users] Jobs not running on reconfigured cluster Hi Richard, - Can you see ganglia web interface? - Are you using a DNS for your cluster? - Are firewalld / iptables services stopped? - Is nscd cache reseted? - is munge running? - I'm not using torque/maui anymore, so I can't check on my side to see if there are some specific config to check... - were the torque / maui package got updated during the process? Olivier. -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________________ De : Richard Young [Ric...@us...] Envoyé : mardi 31 mai 2016 06:29 À : 'osc...@li...' Objet : Re: [Oscar-users] Jobs not running on reconfigured cluster DongInn Did check these before but I re-checked as below: 1. /etc/hosts are the same across the cluster. 2. can ssh to a node and back without any problems or password. The known_hosts file has been updated and copied across the cluster. 3. checked nagios/nrpe and it is setup to allow the admin node to collect details. 4. ganglia/gmond is setup to talk to the admin node. 5. pbs_server and maui on the admin have been restarted with no reported errors in the log files. 6. pbs_mom on the nodes has been restarted with no reported errors in the log files. 7. a search through /etc and /var/lib/torque for the ip-address of the server doesn't find anything other old log entries. 8. /etc/dhcp/dhcpd.conf has been updated. 9. /etc/ntp.conf has been updated across the cluster. Thanks --------------------------------------------------------------------- Richard A. Young ICT Services Email: Ric...@us... Phone: (07) 46315557 Mob: 0437544370 Fax: (07) 46312798 --------------------------------------------------------------------- -----Original Message----- From: Kim, DongInn [mailto:di...@in...] Sent: Tuesday, 31 May 2016 12:05 PM To: Users OSCAR Subject: Re: [Oscar-users] Jobs not running on reconfigured cluster Hi Richard, I would like to double check the following items if I were you. 1. /etc/hosts, ssh keys, nagios/nrpe, gmetad/gmond are all synced through all the nodes. 2. Make sure that the root user can ssh into all the nodes back and forth without password. 3. All the daemons of the job submission are running on all the nodes: (torque-server, torque-mom in the head node and torque-mom in the client nodes and maui on the head node) I assume that you are using torque as RM and maui as a scheduler. Regards, -- - DongInn > On May 30, 2016, at 7:25 PM, Richard Young <Ric...@us...> wrote: > > I was hoping somebody would be able to help me with the following problem. > > Recently I have applied updates and done some reconfiguration on a RHEL6.8 cluster running Oscar. The major change was changing the ipaddress of the oscar_server, this was required because changes to the network structure. The ipaddress has been applied to /etc/hosts, ssh keys, nagios/nrpe, gmetad/gmond etc. However, I have missed something because no jobs will now run on the cluster. The jobs basically site in the queue and then get cancelled because they have hit their walltime. > > Has anybody come across this problem before and be able to supply some insight into how to fix the problem(s). > > Thanks > > --------------------------------------------------------------------- > Richard A. Young > ICT Services > HPC Systems Engineer > University of Southern Queensland > Toowoomba, Queensland 4350 > Australia > Email: Ric...@us... Phone: (07) 46315557 > Mob: 0437544370 Fax: (07) 46312798 > --------------------------------------------------------------------- > > > > _____________________________________________________________ > This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. > > The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. > > The University of Southern Queensland is a registered provider of education with the Australian Government. > (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) > > > ---------------------------------------------------------------------- > -------- What NetFlow Analyzer can do for you? Monitors network > bandwidth and traffic patterns at an interface-level. Reveals which > users, apps, and protocols are consuming the most bandwidth. Provides > multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make > informed decisions using capacity planning reports. > https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > Oscar-users mailing list > Osc...@li... > https://lists.sourceforge.net/lists/listinfo/oscar-users _____________________________________________________________ This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government. (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e _______________________________________________ Oscar-users mailing list Osc...@li... https://lists.sourceforge.net/lists/listinfo/oscar-users ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e _______________________________________________ Oscar-users mailing list Osc...@li... https://lists.sourceforge.net/lists/listinfo/oscar-users _____________________________________________________________ This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government. (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) |
From: LAHAYE O. <oli...@ce...> - 2016-05-31 08:11:54
|
Hi Richard, - Can you see ganglia web interface? - Are you using a DNS for your cluster? - Are firewalld / iptables services stopped? - Is nscd cache reseted? - is munge running? - I'm not using torque/maui anymore, so I can't check on my side to see if there are some specific config to check... - were the torque / maui package got updated during the process? Olivier. -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________________ De : Richard Young [Ric...@us...] Envoyé : mardi 31 mai 2016 06:29 À : 'osc...@li...' Objet : Re: [Oscar-users] Jobs not running on reconfigured cluster DongInn Did check these before but I re-checked as below: 1. /etc/hosts are the same across the cluster. 2. can ssh to a node and back without any problems or password. The known_hosts file has been updated and copied across the cluster. 3. checked nagios/nrpe and it is setup to allow the admin node to collect details. 4. ganglia/gmond is setup to talk to the admin node. 5. pbs_server and maui on the admin have been restarted with no reported errors in the log files. 6. pbs_mom on the nodes has been restarted with no reported errors in the log files. 7. a search through /etc and /var/lib/torque for the ip-address of the server doesn't find anything other old log entries. 8. /etc/dhcp/dhcpd.conf has been updated. 9. /etc/ntp.conf has been updated across the cluster. Thanks --------------------------------------------------------------------- Richard A. Young ICT Services Email: Ric...@us... Phone: (07) 46315557 Mob: 0437544370 Fax: (07) 46312798 --------------------------------------------------------------------- -----Original Message----- From: Kim, DongInn [mailto:di...@in...] Sent: Tuesday, 31 May 2016 12:05 PM To: Users OSCAR Subject: Re: [Oscar-users] Jobs not running on reconfigured cluster Hi Richard, I would like to double check the following items if I were you. 1. /etc/hosts, ssh keys, nagios/nrpe, gmetad/gmond are all synced through all the nodes. 2. Make sure that the root user can ssh into all the nodes back and forth without password. 3. All the daemons of the job submission are running on all the nodes: (torque-server, torque-mom in the head node and torque-mom in the client nodes and maui on the head node) I assume that you are using torque as RM and maui as a scheduler. Regards, -- - DongInn > On May 30, 2016, at 7:25 PM, Richard Young <Ric...@us...> wrote: > > I was hoping somebody would be able to help me with the following problem. > > Recently I have applied updates and done some reconfiguration on a RHEL6.8 cluster running Oscar. The major change was changing the ipaddress of the oscar_server, this was required because changes to the network structure. The ipaddress has been applied to /etc/hosts, ssh keys, nagios/nrpe, gmetad/gmond etc. However, I have missed something because no jobs will now run on the cluster. The jobs basically site in the queue and then get cancelled because they have hit their walltime. > > Has anybody come across this problem before and be able to supply some insight into how to fix the problem(s). > > Thanks > > --------------------------------------------------------------------- > Richard A. Young > ICT Services > HPC Systems Engineer > University of Southern Queensland > Toowoomba, Queensland 4350 > Australia > Email: Ric...@us... Phone: (07) 46315557 > Mob: 0437544370 Fax: (07) 46312798 > --------------------------------------------------------------------- > > > > _____________________________________________________________ > This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. > > The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. > > The University of Southern Queensland is a registered provider of education with the Australian Government. > (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic > patterns at an interface-level. Reveals which users, apps, and protocols are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > Oscar-users mailing list > Osc...@li... > https://lists.sourceforge.net/lists/listinfo/oscar-users _____________________________________________________________ This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government. (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e _______________________________________________ Oscar-users mailing list Osc...@li... https://lists.sourceforge.net/lists/listinfo/oscar-users |
From: Richard Y. <Ric...@us...> - 2016-05-31 04:30:04
|
DongInn Did check these before but I re-checked as below: 1. /etc/hosts are the same across the cluster. 2. can ssh to a node and back without any problems or password. The known_hosts file has been updated and copied across the cluster. 3. checked nagios/nrpe and it is setup to allow the admin node to collect details. 4. ganglia/gmond is setup to talk to the admin node. 5. pbs_server and maui on the admin have been restarted with no reported errors in the log files. 6. pbs_mom on the nodes has been restarted with no reported errors in the log files. 7. a search through /etc and /var/lib/torque for the ip-address of the server doesn't find anything other old log entries. 8. /etc/dhcp/dhcpd.conf has been updated. 9. /etc/ntp.conf has been updated across the cluster. Thanks --------------------------------------------------------------------- Richard A. Young ICT Services Email: Ric...@us... Phone: (07) 46315557 Mob: 0437544370 Fax: (07) 46312798 --------------------------------------------------------------------- -----Original Message----- From: Kim, DongInn [mailto:di...@in...] Sent: Tuesday, 31 May 2016 12:05 PM To: Users OSCAR Subject: Re: [Oscar-users] Jobs not running on reconfigured cluster Hi Richard, I would like to double check the following items if I were you. 1. /etc/hosts, ssh keys, nagios/nrpe, gmetad/gmond are all synced through all the nodes. 2. Make sure that the root user can ssh into all the nodes back and forth without password. 3. All the daemons of the job submission are running on all the nodes: (torque-server, torque-mom in the head node and torque-mom in the client nodes and maui on the head node) I assume that you are using torque as RM and maui as a scheduler. Regards, -- - DongInn > On May 30, 2016, at 7:25 PM, Richard Young <Ric...@us...> wrote: > > I was hoping somebody would be able to help me with the following problem. > > Recently I have applied updates and done some reconfiguration on a RHEL6.8 cluster running Oscar. The major change was changing the ipaddress of the oscar_server, this was required because changes to the network structure. The ipaddress has been applied to /etc/hosts, ssh keys, nagios/nrpe, gmetad/gmond etc. However, I have missed something because no jobs will now run on the cluster. The jobs basically site in the queue and then get cancelled because they have hit their walltime. > > Has anybody come across this problem before and be able to supply some insight into how to fix the problem(s). > > Thanks > > --------------------------------------------------------------------- > Richard A. Young > ICT Services > HPC Systems Engineer > University of Southern Queensland > Toowoomba, Queensland 4350 > Australia > Email: Ric...@us... Phone: (07) 46315557 > Mob: 0437544370 Fax: (07) 46312798 > --------------------------------------------------------------------- > > > > _____________________________________________________________ > This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. > > The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. > > The University of Southern Queensland is a registered provider of education with the Australian Government. > (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic > patterns at an interface-level. Reveals which users, apps, and protocols are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > Oscar-users mailing list > Osc...@li... > https://lists.sourceforge.net/lists/listinfo/oscar-users _____________________________________________________________ This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government. (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) |
From: Kim, D. <di...@in...> - 2016-05-31 02:04:54
|
Hi Richard, I would like to double check the following items if I were you. 1. /etc/hosts, ssh keys, nagios/nrpe, gmetad/gmond are all synced through all the nodes. 2. Make sure that the root user can ssh into all the nodes back and forth without password. 3. All the daemons of the job submission are running on all the nodes: (torque-server, torque-mom in the head node and torque-mom in the client nodes and maui on the head node) I assume that you are using torque as RM and maui as a scheduler. Regards, -- - DongInn > On May 30, 2016, at 7:25 PM, Richard Young <Ric...@us...> wrote: > > I was hoping somebody would be able to help me with the following problem. > > Recently I have applied updates and done some reconfiguration on a RHEL6.8 cluster running Oscar. The major change was changing the ipaddress of the oscar_server, this was required because changes to the network structure. The ipaddress has been applied to /etc/hosts, ssh keys, nagios/nrpe, gmetad/gmond etc. However, I have missed something because no jobs will now run on the cluster. The jobs basically site in the queue and then get cancelled because they have hit their walltime. > > Has anybody come across this problem before and be able to supply some insight into how to fix the problem(s). > > Thanks > > --------------------------------------------------------------------- > Richard A. Young > ICT Services > HPC Systems Engineer > University of Southern Queensland > Toowoomba, Queensland 4350 > Australia > Email: Ric...@us... Phone: (07) 46315557 > Mob: 0437544370 Fax: (07) 46312798 > --------------------------------------------------------------------- > > > > _____________________________________________________________ > This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. > > The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. > > The University of Southern Queensland is a registered provider of education with the Australian Government. > (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic > patterns at an interface-level. Reveals which users, apps, and protocols are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > Oscar-users mailing list > Osc...@li... > https://lists.sourceforge.net/lists/listinfo/oscar-users |
From: Richard Y. <Ric...@us...> - 2016-05-30 23:47:43
|
I was hoping somebody would be able to help me with the following problem. Recently I have applied updates and done some reconfiguration on a RHEL6.8 cluster running Oscar. The major change was changing the ipaddress of the oscar_server, this was required because changes to the network structure. The ipaddress has been applied to /etc/hosts, ssh keys, nagios/nrpe, gmetad/gmond etc. However, I have missed something because no jobs will now run on the cluster. The jobs basically site in the queue and then get cancelled because they have hit their walltime. Has anybody come across this problem before and be able to supply some insight into how to fix the problem(s). Thanks --------------------------------------------------------------------- Richard A. Young ICT Services HPC Systems Engineer University of Southern Queensland Toowoomba, Queensland 4350 Australia Email: Ric...@us... Phone: (07) 46315557 Mob: 0437544370 Fax: (07) 46312798 --------------------------------------------------------------------- _____________________________________________________________ This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government. (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) |
From: LAHAYE O. <oli...@ce...> - 2015-07-10 11:42:43
|
Hi, The correct procedure to build oscar-modules is: wget http://svn.oscar.openclustergroup.org/repos/unstable/rhel-7-x86_64/oscar-release-6.1.2r11039-1.el7.centos.noarch.rpm yum -y install oscar-release-6.1.2r11039-1.el7.centos.noarch.rpm<http://svn.oscar.openclustergroup.org/repos/unstable/rhel-7-x86_64/oscar-release-6.1.2r11039-1.el7.centos.noarch.rpm> yum -y install oscar-packager oscar-config -t rhel-7-x86_64 oscar-packager -d --all modules-oscar Otherwise, if you want to built it by hand, look inside the modules-oscar.cfg in the svn and you'll find that the required source is available here: wget http://svn.oscar.openclustergroup.org/pkgs/downloads/modules-oscar/{modules-3.3.a.tar.bz2,Modules-Paper.doc,Modules-Paper.pdf} Best regards. -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________ De : PREUKSCHAS, FABIEN [fab...@at...] Envoyé : jeudi 9 juillet 2015 18:32 À : osc...@li... Objet : [Oscar-users] modules-3.3.a.tar.bz2 Hi all, I’m trying to rebuild oscar-modules using .spec from svn and in this file, a modules-3.3.a.tar.bz2 is called to build the rpm. I can’t find this archive anywhere on internet… Am I missing something ? Thanks for the answer. |
From: PREUKSCHAS, F. <fab...@at...> - 2015-07-09 16:33:02
|
Hi all, I'm trying to rebuild oscar-modules using .spec from svn and in this file, a modules-3.3.a.tar.bz2 is called to build the rpm. I can't find this archive anywhere on internet... Am I missing something ? Thanks for the answer. |
From: Marcin D. <md...@dt...> - 2015-06-01 17:37:25
|
Hi, i'm trying to setup a test CentOS 6.6 x86_64 server with oscar-release-6.1.2r11040-1.el6.noarch on eth1, but it fails in mysql: ==> server: Successfully initialized ODA ==> server: Database Initialization... ==> server: Restarting the database service... ==> server: mysqld (pid 6589) is running... ==> server: Stopping mysqld: ==> server: [ ==> server: OK ] ==> server: Starting mysqld: ==> server: [ OK ] ==> server: Database_status: 1 ==> server: DBD::mysql::db do failed: Cannot add or update a child row: a foreign key constraint fails (`oscar`.`Nodes`, CONSTRAINT `Nodes_ibfk_2` FOREIGN KEY (`group_name`) REFERENCES `Groups` (`name`) ON DELETE CASCADE ON UPDATE CASCADE) at /usr/share/perl5/vendor_perl/OSCAR/oda.pm line 713. ==> server: DB_DEBUG>/usr/bin/create_and_populate_basic_node_info: ==> server: ====> in Database::do_insert SQL : INSERT INTO Nodes (cluster_id, hostname, name, group_name) SELECT 1, 'oscar-server', 'oscar-server', 'oscar-server' ==> server: Error message: Failed to insert values into Nodes table in database <oscar>: Cannot add or update a child row: a foreign key constraint fails (`oscar`.`Nodes`, CONSTRAINT `Nodes_ibfk_2` FOREIGN KEY (`group_name`) REFERENCES `Groups` (`name`) ON DELETE CASCADE ON UPDATE CASCADE) at /usr/share/perl5/vendor_perl/OSCAR/oda.pm line 719. ==> server: /usr/bin/create_and_populate_basic_node_info: SQL command that failed was: <INSERT INTO Nodes (cluster_id, hostname, name, group_name) SELECT 1, 'oscar-server', 'oscar-server', 'oscar-server'> at /usr/share/perl5/vendor_perl/OSCAR/oda.pm line 719. ==> server: DB_DEBUG>/usr/bin/create_and_populate_basic_node_info: ==> server: ====>Failed to insert values via << INSERT INTO Nodes (cluster_id, hostname, name, group_name) SELECT 1, 'oscar-server', 'oscar-server', 'oscar-server' >> at /usr/bin/create_and_populate_basic_node_info line 86 ==> server: ERROR: Impossible to set headnode information in the database at /usr/bin/create_and_populate_basic_node_info line 90. ==> server: Checking for database existence of node oscar-server ... ==> server: [ERROR - oscar-config] Failed to bootstrap OSCAR ==> server: [ERROR - oscar-config] Unable to bootstrap OSCAR I've found several similar problems on the list, but no solution. My Vagrantfile (for vagrant-1.7.2) is attached. I've tried also oscar-release-6.1.2r10588-1.noarch.rpm with the same result. Best regards, Marcin |
From: LAHAYE O. <oli...@ce...> - 2015-04-10 12:39:18
|
99all.harmless_example_script is ran so the "all" scripts should run as well. What is your image node distro? anyway, as you found a way to run the scripts, it should be easy to fix as the code inside is simple. You could check by inserting a "-x" in the #!/bin/sh script header to see what is going on. What is the content of /var/lib/systemimager/scripts/post-install (ls -la) ? I have: $ ls -la total 48 drwxr-xr-x. 2 root root 4096 6 août 2014 . drwxr-xr-x. 4 root root 4096 9 mars 17:11 .. -rw-r--r--. 1 root root 2730 30 juil. 2014 10all.fix_swap_uuids -rw-r--r--. 1 root root 1135 30 juil. 2014 11all.replace_byid_device -rw-r--r--. 1 root root 219 14 juin 2013 13all.keyboard_fr -rw-r--r--. 1 root root 1895 25 févr. 2014 14all.grub2_install -rw-r--r--. 1 root root 842 21 févr. 2014 15all.grub_install -rw-r--r--. 1 root root 1504 21 févr. 2014 16all.network_config -rw-r--r--. 1 root root 5131 30 juil. 2014 95all.monitord_rebooted -rw-r--r--. 1 root root 171 30 juil. 2014 99all.harmless_example_script -rw-r--r--. 1 root root 2212 30 juil. 2014 README $ pwd /var/lib/systemimager/scripts/post-install Note: on centos-6, this is a grub1 OS. On centos-7 or fedora-18+ this is a grub2 system Best regards, Olivier. -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________ De : - - [an...@co...] Envoyé : jeudi 9 avril 2015 20:05 À : LAHAYE Olivier Cc : osc...@li... Objet : Re: RE:Re: [Oscar-users] - Image won't boot after build completes Interesting and weird! so, I did a wget on the scripts and placed them in the folder but the would not work. See screenshot. As you can see from the screen shot only the 90DELL610add.nfs script worked (DELL610 is my image name). So, I decided to rename all the script to ##DELL610 and then they ran! so I think it has an issue with #all for some reason. There is a problem though :( it doesn't think my system is a Grub or Grub 2 system??? so neither boot loaders get applied. After I renamed the files to ##DELL610 I get this... I live in /var/lib/systemimager/scripts/post-install. See: /var/lib/systemimager/scripts/post-install/README for details. >>> 10DELL610.fix_swap_uuids >>> 11DELL610.replace_byid_device >>> 14DELL610.grub2_install Not a grub 2 system, exitting. >>> 15DELL610.grub_install This is not a grub1 system. Exitting... >>> 16DELL610.network_config Setting up network configuration for test1 I wonder why it doesn't think its a grub system? I built everything through the GUI so no special installs of any type. The array is first configured through the PERC controller as mirrored set and I make sure the megaraid_sas module is part of the UYOK. That's about it.... On April 9, 2015 at 11:34 AM LAHAYE Olivier <oli...@ce...> wrote: Hi andy, You need to add the following postinstall scripts in your /var/lib/systemimager/scripts/post-install directory on the server. http://svn.oscar.openclustergroup.org/pkgs/downloads/sis_postinstall/ This should fix your problem (System Imager is not grub and NetworkManager aware and thus you need those postinstall scripts to finish the job. This is a workaround that I'm using daily until I find time to fix SystemImager and system-configurator. This is step 25 of the quick start guide available here: http://svn.oscar.openclustergroup.org/trac/oscar/wiki/quick_start_guide_for_rhel Best regards. Olivier. -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________ De : - - [an...@co...] Envoyé : mercredi 8 avril 2015 23:38 À : LAHAYE Olivier Objet : Fwd: Re: [Oscar-users] - Image won't boot after build completes Olivier, I found this while googl'ing :) it seems to be the same issue but I'm sure ya'll fixed it already? https://www.mail-archive.com/sis...@li.../msg05563.html ---------- Original Message ---------- From: - - <an...@co...> To: LAHAYE Olivier <oli...@ce...> Cc: oscar-users <osc...@li...> Date: April 8, 2015 at 5:18 PM Subject: Re: [Oscar-users] - Image won't boot after build completes Here is some of the log file of the build process. Thanks again for the assistance. get_arch enumerate_disks sda DISKS=1 Partitioning /dev/sda... Old partition table for /dev/sda: Model: DELL PERC 6/i (scsi) Disk /dev/sda: 146GB Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 1049kB 101MB 99.6MB primary ext2 2 101MB 146GB 146GB extended lba 5 102MB 614MB 513MB logical linux-swap(v1) 6 615MB 146GB 146GB logical ext3 dd if=/dev/zero of=/dev/sda bs=512 count=1 || shellout 1+0 records in 1+0 records out 512 bytes (512 B) copied, 3.5902e-05 s, 14.3 MB/s blockdev --rereadpt /dev/sda parted -s -- /dev/sda mklabel msdos || shellout Creating partition /dev/sda1. parted -s -- /dev/sda mkpart primary 1 101 || shellout parted -s -- /dev/sda set 1 boot on || shellout parted -s -- /dev/sda set 1 boot on Creating partition /dev/sda2. parted -s -- /dev/sda mkpart extended 101 146163 || shellout Creating partition /dev/sda5. Creating partition /dev/sda6. parted -s -- /dev/sda mkpart logical 615 146163 || shellout Warning: The resulting partition is not properly aligned for best performance. New partition table for /dev/sda: parted -s -- /dev/sda print Model: DELL PERC 6/i (scsi) Disk /dev/sda: 146GB Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 1049kB 101MB 99.6MB primary ext2 2 101MB 146GB 146GB extended lba 5 102MB 614MB 513MB logical linux-swap(v1) 6 615MB 146GB 146GB logical ext3 Load software RAID modules. Load device mapper driver (for LVM). Load additional filesystem drivers. modprobe: module jfs not found in modules.dep modprobe: module reiserfs not found in modules.dep mkswap -v1 /dev/sda5 || shellout mkswap: /dev/sda5: warning: wiping old swap signature. Setting up swapspace version 1, size = 500732 KiB no label, UUID=34be0ba5-ea37-42c2-9f8c-dafbc2bf3f97 swapon /dev/sda5 || shellout mke2fs -q -t ext3 /dev/sda6 || shellout mkdir -p /a/ || shellout mount /dev/sda6 /a/ -t ext3 -o defaults || shellout mke2fs -q -t ext2 /dev/sda1 || shellout mkdir -p /a/boot || shellout mount /dev/sda1 /a/boot -t ext2 -o defaults || shellout mkdir -p /a/proc || shellout mount proc /a/proc -t proc -o defaults || shellout mkdir -p /a/sys || shellout Evaluating image size... --> Image size = 1312MiB Report task started. Quietly installing image... rsync -aHS --exclude=lost+found/ --exclude=/proc/* --numeric-ids 192.168.3.254::DELL610/ /a/ Report task stopped. rsync -av --numeric-ids 192.168.3.254::overrides/DELL610/ /a/ rsync -av --numeric-ids 192.168.3.254::overrides/test1/ /a/ rsync: change_dir "/test1" (in overrides) failed: No such file or directory (2) rsync error: some files could not be transferred (code 23) at main.c(1538) [receiver=3.0.0] Override directory test1 doesn't seem to exist, but that may be OK. Editing files for actual disk configuration... /dev/sda -> /dev/sda /etc/fstab /etc/systemconfig/systemconfig.conf /boot/grub/menu.lst run_post_install_scripts >>> 10all.fix_swap_uuids >>> 11all.replace_byid_device >>> 95all.monitord_rebooted >>> 99all.debug ======= BEGIN_DEBUG -rwxr-xr-x 1 root root 18417 Oct 15 14:47 /sbin/grub-install ======= /boot/grub /boot/grub/splash.xpm.gz /boot/grub/menu.lst /boot/grub/menu.lst.image ======= cat: /boot/grub/device.map: No such file or directory ======= package grub* is not installed ======= END_DEBUG >>> 99all.harmless_example_script On April 8, 2015 at 5:15 PM - - <an...@co...> wrote: Hi Olivier, I never got a chance to get back to this due to other projects but I'm back at it again :) I tried your latest build and I'm still stuck at the same area. Everything works, it build but the grub is never created it seems. I tried what you said to do below but the package is already installed. [root@oscarcluster images]# yum --installroot=/var/lib/systemimager/images/DELL610 install grub Loaded plugins: fastestmirror, refresh-packagekit, security Setting up Install Process base | 3.7 kB 00:00 base/primary_db | 4.6 MB 00:02 extras | 3.4 kB 00:00 extras/primary_db | 30 kB 00:00 updates | 3.4 kB 00:00 updates/primary_db | 2.7 MB 00:01 Package 1:grub-0.97-93.el6.x86_64 already installed and latest version Nothing to do I'm not sure how to get this thing fixed. Any ideas? On January 13, 2015 at 11:59 AM LAHAYE Olivier <oli...@ce...> wrote: chroot /var/lib/systemimager/images/<youimage> rpm -qa|grep -i grub I think your image is incomplete. I would try to rebuild an image in the 1st place and check again if grub is installed if still not installed (strange?) I would try to reinstall grub in the image yum --installroot=/path/to/image install grub If grub is missing, I'm pretty sure that some other packages are missing. Best regards, Olivier. -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________________ De : - - [an...@co...] Envoyé : samedi 10 janvier 2015 01:43 À : LAHAYE Olivier Objet : Re: RE:RE:[Oscar-users] stuck... debug below. Thanks! ======= BEGIN_DEBUG -rwxr-xr-x 1 root root 18417 Oct 15 14:47 /sbin/grub-install ======= /boot/grub /boot/grub/splash.xpm.gz /boot/grub/menu.lst /boot/grub/menu.lst.image ======= cat: /boot/grub/device.map: No such file or directory ======= package grub* is not installed ======= END_DEBUG > On January 9, 2015 at 4:10 AM LAHAYE Olivier <oli...@ce...> wrote: > > > Hi, > > It looks like your imaging is not correct. At least one of the 2 following > files is missing: > - /sbin/grub-install > - /boot/grub/device.map (this should contyain something like:) > -----------8<-----------8<-----------8<-----------8<-----------8<----------- > # this device map was generated by anaconda > (hd0) /dev/sda > > -----------8<-----------8<-----------8<-----------8<-----------8<----------- > > In /var/lib/systemimager/scripts/post-install/ try to add a 99all.debug > script that would contain: > -----------8<-----------8<-----------8<-----------8<-----------8<----------- > #!/bin/bash > echo ======= BEGIN_DEBUG > ls -l /sbin/grub-install > echo ======= > find /boot/grub > echo ======= > cat /boot/grub/device.map > echo ======= > rpm -q grub* > echo ======= END_DEBUG > -----------8<-----------8<-----------8<-----------8<-----------8<----------- > > And try to reimage. For some reason something is missing and I need to > understand why. > > I need the infos between BEGIN_DEBUG and END_DEBUG. > In the meantime, I'm rebuilding the full oscar dist for centos-6 just in case > something was missed by me. > > Cheers, > > Olivier. > > > > > -- > Olivier LAHAYE > CEA DRT/LIST/DIR > > > > > > De : - - [an...@co...] > Envoyé : vendredi 9 janvier 2015 00:43 > À : oscar-users; LAHAYE Olivier > Objet : Re: RE:[Oscar-users] stuck... > > > > > > > > Olivier, > > Xming for some reason does not copy/paste well. I think its a buffer size > issue. Below is the more important stuff and I think I see the issue. as you > can see from the S14 and S15 scripts the system thinks its not a grub1 or 2 > system?? This is CentOS 6.6 so I'm wondering if something changed maybe to > make it think this? Thanks as always!! > > > > 512 bytes (512 B) copied, 7.6133e-05 s, 6.7 MB/sblockdev --rereadpt /dev/sda > parted -s -- /dev/sda mklabel msdos || shellout > Creating partition /dev/sda1. > parted -s -- /dev/sda mkpart primary 1 101 || shellout > parted -s -- /dev/sda set 1 boot on || shellout parted -s -- /dev/sda set 1 > boot on > Creating partition /dev/sda2. > parted -s -- /dev/sda mkpart extended 101 72746 || shellout > Creating partition /dev/sda5. > parted -s -- /dev/sda mkpart logical linux-swap 102 614 || shellout > Creating partition /dev/sda6. > parted -s -- /dev/sda mkpart logical 615 72746 || shellout > Warning: The resulting partition is not properly aligned for best > performance. > New partition table for /dev/sda: > parted -s -- /dev/sda print > Model: DELL PERC 5/i (scsi) > Disk /dev/sda: 72.7GB > Sector size (logical/physical): 512B/512B > Partition Table: msdos > Disk Flags: > Number Start End Size Type File system Flags > 1 1049kB 101MB 99.6MB primary ext2 > 2 101MB 72.7GB 72.6GB extended lba > 5 102MB 614MB 513MB logical linux-swap(v1) > 6 615MB 72.7GB 72.1GB logical ext3 > Load software RAID modules. > Load device mapper driver (for LVM). > Load additional filesystem drivers. > modprobe: module jfs not found in modules.dep > modprobe: module reiserfs not found in modules.dep > mkswap -v1 /dev/sda5 || shellout > mkswap: /dev/sda5: warning: wiping old swap signature. > Setting up swapspace version 1, size = 500732 KiB > no label, UUID=c81c1d5a-e062-45a8-8442-24e1d8e1828d > swapon /dev/sda5 || shellout > mke2fs -q -t ext3 /dev/sda6 || shellout > mkdir -p /a/ || shellout > mount /dev/sda6 /a/ -t ext3 -o defaults || shellout > mke2fs -q -t ext2 /dev/sda1 || shellout > mkdir -p /a/boot || shellout > mount /dev/sda1 /a/boot -t ext2 -o defaults || shellout > mkdir -p /a/proc || shellout > mount proc /a/proc -t proc -o defaults || shellout > mkdir -p /a/sys || shellout > mount sysfs /a/sys -t sysfs -o defaults || shellout > Evaluating image size... > --> Image size = 1304MiB > Report task started. > Quietly installing image... > rsync -aHS --exclude=lost+found/ --exclude=/proc/* --numeric-ids > 192.168.3.101::test2/ /a/ > Report task stopped. > rsync -av --numeric-ids 192.168.3.101::overrides/test2/ /a/ > rsync -av --numeric-ids 192.168.3.101::overrides/test23/ /a/ > rsync: change_dir "/test23" (in overrides) failed: No such file or directory > (2) > rsync error: some files could not be transferred (code 23) at main.c(1538) > [receiver=3.0.0] > Override directory test23 doesn't seem to exist, but that may be OK. > Editing files for actual disk configuration... > /dev/sda -> /dev/sda > /etc/fstab > /etc/systemconfig/systemconfig.conf > /boot/grub/menu.lst > > run_post_install_scripts > >>> 10all.fix_swap_uuids > >>> 11all.replace_byid_device > >>> 14all.grub2_install > Not a grub 2 system, exitting. > >>> 15all.grub_install > This is not a grub1 system. Exitting... > >>> 95all.monitord_rebooted > >>> 99all.harmless_example_script > I live in /var/lib/systemimager/scripts/post-install. > See: /var/lib/systemimager/scripts/post-install/README for details. > >>> 90test2.add_nfs > umount /a/sys || mount -no remount,ro /a//sys || shellout > umount /a/proc || mount -no remount,ro /a//proc || shellout > umount /a/boot || mount -no remount,ro /a//boot || shellout > umount /a/ || mount -no remount,ro /a// || shellout > umount: /a: target is busy. > (In some cases useful info about processes that use > the device is found by lsof(8) or fuser(1)) > Imaging completed > > > > On January 8, 2015 at 8:32 AM LAHAYE Olivier <oli...@ce...> wrote: > > > > > > > > Ho, ok, > > > > then, can you send me the log of the install of a node. You can save it to > > a file from the deployment monitor. Just double click on the host during > > its imaging and when it is finished, use save from the drop down menu. > > > > Olivier. > > > > > > > > > > > > > > -- > > Olivier LAHAYE > > CEA DRT/LIST/DIR > > > > > > > > > > > > De : an...@co... [an...@co...] > > Envoyé : jeudi 8 janvier 2015 14:14 > > À : LAHAYE Olivier; osc...@li... > > Objet : Re: [Oscar-users] stuck... > > > > > > > > > > I'm running 6.6 > > > > /\ndy > > > > > > ----- Reply message ----- > > From: "LAHAYE Olivier" <oli...@ce...> > > To: "- -" <an...@co...>, "oscar-users" > > <osc...@li...> > > Subject: [Oscar-users] stuck... > > Date: Thu, Jan 8, 2015 2:48 AM > > > > > > > > Hi, > > > > If you're running CentOS7, make sure you're using the grub2 post-install > > script: 14all.grub2_install > > 15all.grub_install is for centos6. > > > > Cheers, > > > > Olivier. > > > > > > > > > > > > > > -- > > Olivier LAHAYE > > CEA DRT/LIST/DIR > > > > > > > > > > > > De : - - [an...@co...] > > Envoyé : jeudi 8 janvier 2015 04:34 > > À : oscar-users; LAHAYE Olivier > > Objet : Re: [Oscar-users] stuck... > > > > > > > > Hi Olivier, > > > > Well, I was able to get the system to build at least but I can't get it to > > boot afterwards. It doesn't make it to the bootloader and I did implement > > your S15 fix script that installs the grub bootloader but still no go. Any > > idea what else I could look into? I'm not sure on this issue. |
From: LAHAYE O. <oli...@ce...> - 2015-04-09 15:34:40
|
Hi andy, You need to add the following postinstall scripts in your /var/lib/systemimager/scripts/post-install directory on the server. http://svn.oscar.openclustergroup.org/pkgs/downloads/sis_postinstall/ This should fix your problem (System Imager is not grub and NetworkManager aware and thus you need those postinstall scripts to finish the job. This is a workaround that I'm using daily until I find time to fix SystemImager and system-configurator. This is step 25 of the quick start guide available here: http://svn.oscar.openclustergroup.org/trac/oscar/wiki/quick_start_guide_for_rhel Best regards. Olivier. -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________ De : - - [an...@co...] Envoyé : mercredi 8 avril 2015 23:38 À : LAHAYE Olivier Objet : Fwd: Re: [Oscar-users] - Image won't boot after build completes Olivier, I found this while googl'ing :) it seems to be the same issue but I'm sure ya'll fixed it already? https://www.mail-archive.com/sis...@li.../msg05563.html ---------- Original Message ---------- From: - - <an...@co...> To: LAHAYE Olivier <oli...@ce...> Cc: oscar-users <osc...@li...> Date: April 8, 2015 at 5:18 PM Subject: Re: [Oscar-users] - Image won't boot after build completes Here is some of the log file of the build process. Thanks again for the assistance. get_arch enumerate_disks sda DISKS=1 Partitioning /dev/sda... Old partition table for /dev/sda: Model: DELL PERC 6/i (scsi) Disk /dev/sda: 146GB Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 1049kB 101MB 99.6MB primary ext2 2 101MB 146GB 146GB extended lba 5 102MB 614MB 513MB logical linux-swap(v1) 6 615MB 146GB 146GB logical ext3 dd if=/dev/zero of=/dev/sda bs=512 count=1 || shellout 1+0 records in 1+0 records out 512 bytes (512 B) copied, 3.5902e-05 s, 14.3 MB/s blockdev --rereadpt /dev/sda parted -s -- /dev/sda mklabel msdos || shellout Creating partition /dev/sda1. parted -s -- /dev/sda mkpart primary 1 101 || shellout parted -s -- /dev/sda set 1 boot on || shellout parted -s -- /dev/sda set 1 boot on Creating partition /dev/sda2. parted -s -- /dev/sda mkpart extended 101 146163 || shellout Creating partition /dev/sda5. Creating partition /dev/sda6. parted -s -- /dev/sda mkpart logical 615 146163 || shellout Warning: The resulting partition is not properly aligned for best performance. New partition table for /dev/sda: parted -s -- /dev/sda print Model: DELL PERC 6/i (scsi) Disk /dev/sda: 146GB Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 1049kB 101MB 99.6MB primary ext2 2 101MB 146GB 146GB extended lba 5 102MB 614MB 513MB logical linux-swap(v1) 6 615MB 146GB 146GB logical ext3 Load software RAID modules. Load device mapper driver (for LVM). Load additional filesystem drivers. modprobe: module jfs not found in modules.dep modprobe: module reiserfs not found in modules.dep mkswap -v1 /dev/sda5 || shellout mkswap: /dev/sda5: warning: wiping old swap signature. Setting up swapspace version 1, size = 500732 KiB no label, UUID=34be0ba5-ea37-42c2-9f8c-dafbc2bf3f97 swapon /dev/sda5 || shellout mke2fs -q -t ext3 /dev/sda6 || shellout mkdir -p /a/ || shellout mount /dev/sda6 /a/ -t ext3 -o defaults || shellout mke2fs -q -t ext2 /dev/sda1 || shellout mkdir -p /a/boot || shellout mount /dev/sda1 /a/boot -t ext2 -o defaults || shellout mkdir -p /a/proc || shellout mount proc /a/proc -t proc -o defaults || shellout mkdir -p /a/sys || shellout Evaluating image size... --> Image size = 1312MiB Report task started. Quietly installing image... rsync -aHS --exclude=lost+found/ --exclude=/proc/* --numeric-ids 192.168.3.254::DELL610/ /a/ Report task stopped. rsync -av --numeric-ids 192.168.3.254::overrides/DELL610/ /a/ rsync -av --numeric-ids 192.168.3.254::overrides/test1/ /a/ rsync: change_dir "/test1" (in overrides) failed: No such file or directory (2) rsync error: some files could not be transferred (code 23) at main.c(1538) [receiver=3.0.0] Override directory test1 doesn't seem to exist, but that may be OK. Editing files for actual disk configuration... /dev/sda -> /dev/sda /etc/fstab /etc/systemconfig/systemconfig.conf /boot/grub/menu.lst run_post_install_scripts >>> 10all.fix_swap_uuids >>> 11all.replace_byid_device >>> 95all.monitord_rebooted >>> 99all.debug ======= BEGIN_DEBUG -rwxr-xr-x 1 root root 18417 Oct 15 14:47 /sbin/grub-install ======= /boot/grub /boot/grub/splash.xpm.gz /boot/grub/menu.lst /boot/grub/menu.lst.image ======= cat: /boot/grub/device.map: No such file or directory ======= package grub* is not installed ======= END_DEBUG >>> 99all.harmless_example_script On April 8, 2015 at 5:15 PM - - <an...@co...> wrote: Hi Olivier, I never got a chance to get back to this due to other projects but I'm back at it again :) I tried your latest build and I'm still stuck at the same area. Everything works, it build but the grub is never created it seems. I tried what you said to do below but the package is already installed. [root@oscarcluster images]# yum --installroot=/var/lib/systemimager/images/DELL610 install grub Loaded plugins: fastestmirror, refresh-packagekit, security Setting up Install Process base | 3.7 kB 00:00 base/primary_db | 4.6 MB 00:02 extras | 3.4 kB 00:00 extras/primary_db | 30 kB 00:00 updates | 3.4 kB 00:00 updates/primary_db | 2.7 MB 00:01 Package 1:grub-0.97-93.el6.x86_64 already installed and latest version Nothing to do I'm not sure how to get this thing fixed. Any ideas? On January 13, 2015 at 11:59 AM LAHAYE Olivier <oli...@ce...> wrote: chroot /var/lib/systemimager/images/<youimage> rpm -qa|grep -i grub I think your image is incomplete. I would try to rebuild an image in the 1st place and check again if grub is installed if still not installed (strange?) I would try to reinstall grub in the image yum --installroot=/path/to/image install grub If grub is missing, I'm pretty sure that some other packages are missing. Best regards, Olivier. -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________________ De : - - [an...@co...] Envoyé : samedi 10 janvier 2015 01:43 À : LAHAYE Olivier Objet : Re: RE:RE:[Oscar-users] stuck... debug below. Thanks! ======= BEGIN_DEBUG -rwxr-xr-x 1 root root 18417 Oct 15 14:47 /sbin/grub-install ======= /boot/grub /boot/grub/splash.xpm.gz /boot/grub/menu.lst /boot/grub/menu.lst.image ======= cat: /boot/grub/device.map: No such file or directory ======= package grub* is not installed ======= END_DEBUG > On January 9, 2015 at 4:10 AM LAHAYE Olivier <oli...@ce...> wrote: > > > Hi, > > It looks like your imaging is not correct. At least one of the 2 following > files is missing: > - /sbin/grub-install > - /boot/grub/device.map (this should contyain something like:) > -----------8<-----------8<-----------8<-----------8<-----------8<----------- > # this device map was generated by anaconda > (hd0) /dev/sda > > -----------8<-----------8<-----------8<-----------8<-----------8<----------- > > In /var/lib/systemimager/scripts/post-install/ try to add a 99all.debug > script that would contain: > -----------8<-----------8<-----------8<-----------8<-----------8<----------- > #!/bin/bash > echo ======= BEGIN_DEBUG > ls -l /sbin/grub-install > echo ======= > find /boot/grub > echo ======= > cat /boot/grub/device.map > echo ======= > rpm -q grub* > echo ======= END_DEBUG > -----------8<-----------8<-----------8<-----------8<-----------8<----------- > > And try to reimage. For some reason something is missing and I need to > understand why. > > I need the infos between BEGIN_DEBUG and END_DEBUG. > In the meantime, I'm rebuilding the full oscar dist for centos-6 just in case > something was missed by me. > > Cheers, > > Olivier. > > > > > -- > Olivier LAHAYE > CEA DRT/LIST/DIR > > > > > > De : - - [an...@co...] > Envoyé : vendredi 9 janvier 2015 00:43 > À : oscar-users; LAHAYE Olivier > Objet : Re: RE:[Oscar-users] stuck... > > > > > > > > Olivier, > > Xming for some reason does not copy/paste well. I think its a buffer size > issue. Below is the more important stuff and I think I see the issue. as you > can see from the S14 and S15 scripts the system thinks its not a grub1 or 2 > system?? This is CentOS 6.6 so I'm wondering if something changed maybe to > make it think this? Thanks as always!! > > > > 512 bytes (512 B) copied, 7.6133e-05 s, 6.7 MB/sblockdev --rereadpt /dev/sda > parted -s -- /dev/sda mklabel msdos || shellout > Creating partition /dev/sda1. > parted -s -- /dev/sda mkpart primary 1 101 || shellout > parted -s -- /dev/sda set 1 boot on || shellout parted -s -- /dev/sda set 1 > boot on > Creating partition /dev/sda2. > parted -s -- /dev/sda mkpart extended 101 72746 || shellout > Creating partition /dev/sda5. > parted -s -- /dev/sda mkpart logical linux-swap 102 614 || shellout > Creating partition /dev/sda6. > parted -s -- /dev/sda mkpart logical 615 72746 || shellout > Warning: The resulting partition is not properly aligned for best > performance. > New partition table for /dev/sda: > parted -s -- /dev/sda print > Model: DELL PERC 5/i (scsi) > Disk /dev/sda: 72.7GB > Sector size (logical/physical): 512B/512B > Partition Table: msdos > Disk Flags: > Number Start End Size Type File system Flags > 1 1049kB 101MB 99.6MB primary ext2 > 2 101MB 72.7GB 72.6GB extended lba > 5 102MB 614MB 513MB logical linux-swap(v1) > 6 615MB 72.7GB 72.1GB logical ext3 > Load software RAID modules. > Load device mapper driver (for LVM). > Load additional filesystem drivers. > modprobe: module jfs not found in modules.dep > modprobe: module reiserfs not found in modules.dep > mkswap -v1 /dev/sda5 || shellout > mkswap: /dev/sda5: warning: wiping old swap signature. > Setting up swapspace version 1, size = 500732 KiB > no label, UUID=c81c1d5a-e062-45a8-8442-24e1d8e1828d > swapon /dev/sda5 || shellout > mke2fs -q -t ext3 /dev/sda6 || shellout > mkdir -p /a/ || shellout > mount /dev/sda6 /a/ -t ext3 -o defaults || shellout > mke2fs -q -t ext2 /dev/sda1 || shellout > mkdir -p /a/boot || shellout > mount /dev/sda1 /a/boot -t ext2 -o defaults || shellout > mkdir -p /a/proc || shellout > mount proc /a/proc -t proc -o defaults || shellout > mkdir -p /a/sys || shellout > mount sysfs /a/sys -t sysfs -o defaults || shellout > Evaluating image size... > --> Image size = 1304MiB > Report task started. > Quietly installing image... > rsync -aHS --exclude=lost+found/ --exclude=/proc/* --numeric-ids > 192.168.3.101::test2/ /a/ > Report task stopped. > rsync -av --numeric-ids 192.168.3.101::overrides/test2/ /a/ > rsync -av --numeric-ids 192.168.3.101::overrides/test23/ /a/ > rsync: change_dir "/test23" (in overrides) failed: No such file or directory > (2) > rsync error: some files could not be transferred (code 23) at main.c(1538) > [receiver=3.0.0] > Override directory test23 doesn't seem to exist, but that may be OK. > Editing files for actual disk configuration... > /dev/sda -> /dev/sda > /etc/fstab > /etc/systemconfig/systemconfig.conf > /boot/grub/menu.lst > > run_post_install_scripts > >>> 10all.fix_swap_uuids > >>> 11all.replace_byid_device > >>> 14all.grub2_install > Not a grub 2 system, exitting. > >>> 15all.grub_install > This is not a grub1 system. Exitting... > >>> 95all.monitord_rebooted > >>> 99all.harmless_example_script > I live in /var/lib/systemimager/scripts/post-install. > See: /var/lib/systemimager/scripts/post-install/README for details. > >>> 90test2.add_nfs > umount /a/sys || mount -no remount,ro /a//sys || shellout > umount /a/proc || mount -no remount,ro /a//proc || shellout > umount /a/boot || mount -no remount,ro /a//boot || shellout > umount /a/ || mount -no remount,ro /a// || shellout > umount: /a: target is busy. > (In some cases useful info about processes that use > the device is found by lsof(8) or fuser(1)) > Imaging completed > > > > On January 8, 2015 at 8:32 AM LAHAYE Olivier <oli...@ce...> wrote: > > > > > > > > Ho, ok, > > > > then, can you send me the log of the install of a node. You can save it to > > a file from the deployment monitor. Just double click on the host during > > its imaging and when it is finished, use save from the drop down menu. > > > > Olivier. > > > > > > > > > > > > > > -- > > Olivier LAHAYE > > CEA DRT/LIST/DIR > > > > > > > > > > > > De : an...@co... [an...@co...] > > Envoyé : jeudi 8 janvier 2015 14:14 > > À : LAHAYE Olivier; osc...@li... > > Objet : Re: [Oscar-users] stuck... > > > > > > > > > > I'm running 6.6 > > > > /\ndy > > > > > > ----- Reply message ----- > > From: "LAHAYE Olivier" <oli...@ce...> > > To: "- -" <an...@co...>, "oscar-users" > > <osc...@li...> > > Subject: [Oscar-users] stuck... > > Date: Thu, Jan 8, 2015 2:48 AM > > > > > > > > Hi, > > > > If you're running CentOS7, make sure you're using the grub2 post-install > > script: 14all.grub2_install > > 15all.grub_install is for centos6. > > > > Cheers, > > > > Olivier. > > > > > > > > > > > > > > -- > > Olivier LAHAYE > > CEA DRT/LIST/DIR > > > > > > > > > > > > De : - - [an...@co...] > > Envoyé : jeudi 8 janvier 2015 04:34 > > À : oscar-users; LAHAYE Olivier > > Objet : Re: [Oscar-users] stuck... > > > > > > > > Hi Olivier, > > > > Well, I was able to get the system to build at least but I can't get it to > > boot afterwards. It doesn't make it to the bootloader and I did implement > > your S15 fix script that installs the grub bootloader but still no go. Any > > idea what else I could look into? I'm not sure on this issue. |
From: - - <an...@co...> - 2015-04-08 21:32:26
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> </head><body> <p>Here is some of the log file of the build process.  Thanks again for the assistance.<br /></p> <p><br /></p> <p><br /></p> <p><br /><br />get_arch<br />enumerate_disks<br />sda<br />DISKS=1<br /><br />Partitioning /dev/sda...<br />Old partition table for /dev/sda:<br />Model: DELL PERC 6/i (scsi)<br />Disk /dev/sda: 146GB<br />Sector size (logical/physical): 512B/512B<br />Partition Table: msdos<br />Disk Flags:<br /><br />Number Start End Size Type File system Flags<br />1 1049kB 101MB 99.6MB primary ext2<br />2 101MB 146GB 146GB extended lba<br />5 102MB 614MB 513MB logical linux-swap(v1)<br />6 615MB 146GB 146GB logical ext3<br /><br />dd if=/dev/zero of=/dev/sda bs=512 count=1 || shellout<br />1+0 records in<br />1+0 records out<br />512 bytes (512 B) copied, 3.5902e-05 s, 14.3 MB/s<br />blockdev --rereadpt /dev/sda<br />parted -s -- /dev/sda mklabel msdos || shellout<br />Creating partition /dev/sda1.<br />parted -s -- /dev/sda mkpart primary 1 101 || shellout<br />parted -s -- /dev/sda set 1 boot on || shellout parted -s -- /dev/sda set 1 boot on<br />Creating partition /dev/sda2.<br />parted -s -- /dev/sda mkpart extended 101 146163 || shellout<br />Creating partition /dev/sda5.<br />Creating partition /dev/sda6.<br />parted -s -- /dev/sda mkpart logical 615 146163 || shellout<br />Warning: The resulting partition is not properly aligned for best performance.<br />New partition table for /dev/sda:<br />parted -s -- /dev/sda print<br />Model: DELL PERC 6/i (scsi)<br />Disk /dev/sda: 146GB<br />Sector size (logical/physical): 512B/512B<br />Partition Table: msdos<br />Disk Flags:<br />Number Start End Size Type File system Flags<br />1 1049kB 101MB 99.6MB primary ext2<br />2 101MB 146GB 146GB extended lba<br />5 102MB 614MB 513MB logical linux-swap(v1)<br />6 615MB 146GB 146GB logical ext3<br />Load software RAID modules.<br />Load device mapper driver (for LVM).<br />Load additional filesystem drivers.<br />modprobe: module jfs not found in modules.dep<br />modprobe: module reiserfs not found in modules.dep<br />mkswap -v1 /dev/sda5 || shellout<br />mkswap: /dev/sda5: warning: wiping old swap signature.<br />Setting up swapspace version 1, size = 500732 KiB<br />no label, UUID=34be0ba5-ea37-42c2-9f8c-dafbc2bf3f97<br />swapon /dev/sda5 || shellout<br />mke2fs -q -t ext3 /dev/sda6 || shellout<br />mkdir -p /a/ || shellout<br />mount /dev/sda6 /a/ -t ext3 -o defaults || shellout<br />mke2fs -q -t ext2 /dev/sda1 || shellout<br />mkdir -p /a/boot || shellout<br />mount /dev/sda1 /a/boot -t ext2 -o defaults || shellout<br />mkdir -p /a/proc || shellout<br />mount proc /a/proc -t proc -o defaults || shellout<br />mkdir -p /a/sys || shellout<br />Evaluating image size...<br />--> Image size = 1312MiB<br />Report task started.<br />Quietly installing image...<br />rsync -aHS --exclude=lost+found/ --exclude=/proc/* --numeric-ids<br />192.168.3.254::DELL610/ /a/<br />Report task stopped.<br />rsync -av --numeric-ids 192.168.3.254::overrides/DELL610/ /a/<br />rsync -av --numeric-ids 192.168.3.254::overrides/test1/ /a/<br />rsync: change_dir "/test1" (in overrides) failed: No such file or directory (2)<br />rsync error: some files could not be transferred (code 23) at main.c(1538) [receiver=3.0.0]<br />Override directory test1 doesn't seem to exist, but that may be OK.<br />Editing files for actual disk configuration...<br />/dev/sda -> /dev/sda<br />/etc/fstab<br />/etc/systemconfig/systemconfig.conf<br />/boot/grub/menu.lst<br />run_post_install_scripts<br />>>> 10all.fix_swap_uuids<br />>>> 11all.replace_byid_device<br />>>> 95all.monitord_rebooted<br />>>> 99all.debug<br />======= BEGIN_DEBUG<br />-rwxr-xr-x 1 root root 18417 Oct 15 14:47 /sbin/grub-install<br />=======<br />/boot/grub<br />/boot/grub/splash.xpm.gz<br />/boot/grub/menu.lst<br />/boot/grub/menu.lst.image<br />=======<br />cat: /boot/grub/device.map: No such file or directory<br />=======<br />package grub* is not installed<br />======= END_DEBUG<br />>>> 99all.harmless_example_script<br /></p> <p><br /></p> <p><br /></p> <blockquote type="cite"> On April 8, 2015 at 5:15 PM - - <an...@co...> wrote: <br /> <br /> <p>Hi Olivier,<br /></p> <p><br /></p> <p>I never got a chance to get back to this due to other projects but I'm back at it again :) I tried your latest build and I'm still stuck at the same area.  Everything works, it build but the grub is never created it seems.  I tried what you said to do below but the package is already installed.<br /></p> <p><br /></p> <p>[root@oscarcluster images]# yum --installroot=/var/lib/systemimager/images/DELL610 install grub<br />Loaded plugins: fastestmirror, refresh-packagekit, security<br />Setting up Install Process<br />base | 3.7 kB 00:00 <br />base/primary_db | 4.6 MB 00:02 <br />extras | 3.4 kB 00:00 <br />extras/primary_db | 30 kB 00:00 <br />updates | 3.4 kB 00:00 <br />updates/primary_db | 2.7 MB 00:01 <br /><strong>Package 1:grub-0.97-93.el6.x86_64 already installed and latest version</strong><br />Nothing to do<br /><br /></p> <p>I'm not sure how to get this thing fixed.  Any ideas?<br /></p> <blockquote type="cite"> <p>On January 13, 2015 at 11:59 AM LAHAYE Olivier <oli...@ce...> wrote:<br /><br /><br />chroot /var/lib/systemimager/images/<youimage><br />rpm -qa|grep -i grub<br /><br />I think your image is incomplete.<br /><br />I would try to rebuild an image in the 1st place and check again if grub is installed<br /><br />if still not installed (strange?) I would try to reinstall grub in the image<br /><br />yum --installroot=/path/to/image install grub<br /><br />If grub is missing, I'm pretty sure that some other packages are missing.<br /><br />Best regards,<br /><br />Olivier.<br />--<br /> Olivier LAHAYE<br /> CEA DRT/LIST/DIR<br /><br />________________________________________<br />De : - - [an...@co...]<br />Envoyé : samedi 10 janvier 2015 01:43<br />À : LAHAYE Olivier<br />Objet : Re: RE:RE:[Oscar-users] stuck...<br /><br />debug below. Thanks!<br /><br /><br />======= BEGIN_DEBUG<br />-rwxr-xr-x 1 root root 18417 Oct 15 14:47 /sbin/grub-install<br />=======<br />/boot/grub<br />/boot/grub/splash.xpm.gz<br />/boot/grub/menu.lst<br />/boot/grub/menu.lst.image<br />=======<br />cat: /boot/grub/device.map: No such file or directory<br />=======<br />package grub* is not installed<br />======= END_DEBUG<br /><br /><br /><br />> On January 9, 2015 at 4:10 AM LAHAYE Olivier <oli...@ce...> wrote:<br />><br />><br />> Hi,<br />><br />> It looks like your imaging is not correct. At least one of the 2 following<br />> files is missing:<br />> - /sbin/grub-install<br />> - /boot/grub/device.map (this should contyain something like:)<br />> -----------8<-----------8<-----------8<-----------8<-----------8<-----------<br />> # this device map was generated by anaconda<br />> (hd0) /dev/sda<br />><br />> -----------8<-----------8<-----------8<-----------8<-----------8<-----------<br />><br />> In /var/lib/systemimager/scripts/post-install/ try to add a 99all.debug<br />> script that would contain:<br />> -----------8<-----------8<-----------8<-----------8<-----------8<-----------<br />> #!/bin/bash<br />> echo ======= BEGIN_DEBUG<br />> ls -l /sbin/grub-install<br />> echo =======<br />> find /boot/grub<br />> echo =======<br />> cat /boot/grub/device.map<br />> echo =======<br />> rpm -q grub*<br />> echo ======= END_DEBUG<br />> -----------8<-----------8<-----------8<-----------8<-----------8<-----------<br />><br />> And try to reimage. For some reason something is missing and I need to<br />> understand why.<br />><br />> I need the infos between BEGIN_DEBUG and END_DEBUG.<br />> In the meantime, I'm rebuilding the full oscar dist for centos-6 just in case<br />> something was missed by me.<br />><br />> Cheers,<br />><br />> Olivier.<br />><br />><br />><br />><br />> --<br />> Olivier LAHAYE<br />> CEA DRT/LIST/DIR<br />><br />><br />><br />><br />><br />> De : - - [an...@co...]<br />> Envoyé : vendredi 9 janvier 2015 00:43<br />> À : oscar-users; LAHAYE Olivier<br />> Objet : Re: RE:[Oscar-users] stuck...<br />><br />><br />><br />><br />><br />><br />><br />> Olivier,<br />><br />> Xming for some reason does not copy/paste well. I think its a buffer size<br />> issue. Below is the more important stuff and I think I see the issue. as you<br />> can see from the S14 and S15 scripts the system thinks its not a grub1 or 2<br />> system?? This is CentOS 6.6 so I'm wondering if something changed maybe to<br />> make it think this? Thanks as always!!<br />><br />><br />><br />> 512 bytes (512 B) copied, 7.6133e-05 s, 6.7 MB/sblockdev --rereadpt /dev/sda<br />> parted -s -- /dev/sda mklabel msdos || shellout<br />> Creating partition /dev/sda1.<br />> parted -s -- /dev/sda mkpart primary 1 101 || shellout<br />> parted -s -- /dev/sda set 1 boot on || shellout parted -s -- /dev/sda set 1<br />> boot on<br />> Creating partition /dev/sda2.<br />> parted -s -- /dev/sda mkpart extended 101 72746 || shellout<br />> Creating partition /dev/sda5.<br />> parted -s -- /dev/sda mkpart logical linux-swap 102 614 || shellout<br />> Creating partition /dev/sda6.<br />> parted -s -- /dev/sda mkpart logical 615 72746 || shellout<br />> Warning: The resulting partition is not properly aligned for best<br />> performance.<br />> New partition table for /dev/sda:<br />> parted -s -- /dev/sda print<br />> Model: DELL PERC 5/i (scsi)<br />> Disk /dev/sda: 72.7GB<br />> Sector size (logical/physical): 512B/512B<br />> Partition Table: msdos<br />> Disk Flags:<br />> Number Start End Size Type File system Flags<br />> 1 1049kB 101MB 99.6MB primary ext2<br />> 2 101MB 72.7GB 72.6GB extended lba<br />> 5 102MB 614MB 513MB logical linux-swap(v1)<br />> 6 615MB 72.7GB 72.1GB logical ext3<br />> Load software RAID modules.<br />> Load device mapper driver (for LVM).<br />> Load additional filesystem drivers.<br />> modprobe: module jfs not found in modules.dep<br />> modprobe: module reiserfs not found in modules.dep<br />> mkswap -v1 /dev/sda5 || shellout<br />> mkswap: /dev/sda5: warning: wiping old swap signature.<br />> Setting up swapspace version 1, size = 500732 KiB<br />> no label, UUID=c81c1d5a-e062-45a8-8442-24e1d8e1828d<br />> swapon /dev/sda5 || shellout<br />> mke2fs -q -t ext3 /dev/sda6 || shellout<br />> mkdir -p /a/ || shellout<br />> mount /dev/sda6 /a/ -t ext3 -o defaults || shellout<br />> mke2fs -q -t ext2 /dev/sda1 || shellout<br />> mkdir -p /a/boot || shellout<br />> mount /dev/sda1 /a/boot -t ext2 -o defaults || shellout<br />> mkdir -p /a/proc || shellout<br />> mount proc /a/proc -t proc -o defaults || shellout<br />> mkdir -p /a/sys || shellout<br />> mount sysfs /a/sys -t sysfs -o defaults || shellout<br />> Evaluating image size...<br />> --> Image size = 1304MiB<br />> Report task started.<br />> Quietly installing image...<br />> rsync -aHS --exclude=lost+found/ --exclude=/proc/* --numeric-ids<br />> 192.168.3.101::test2/ /a/<br />> Report task stopped.<br />> rsync -av --numeric-ids 192.168.3.101::overrides/test2/ /a/<br />> rsync -av --numeric-ids 192.168.3.101::overrides/test23/ /a/<br />> rsync: change_dir "/test23" (in overrides) failed: No such file or directory<br />> (2)<br />> rsync error: some files could not be transferred (code 23) at main.c(1538)<br />> [receiver=3.0.0]<br />> Override directory test23 doesn't seem to exist, but that may be OK.<br />> Editing files for actual disk configuration...<br />> /dev/sda -> /dev/sda<br />> /etc/fstab<br />> /etc/systemconfig/systemconfig.conf<br />> /boot/grub/menu.lst<br />><br />> run_post_install_scripts<br />> >>> 10all.fix_swap_uuids<br />> >>> 11all.replace_byid_device<br />> >>> 14all.grub2_install<br />> Not a grub 2 system, exitting.<br />> >>> 15all.grub_install<br />> This is not a grub1 system. Exitting...<br />> >>> 95all.monitord_rebooted<br />> >>> 99all.harmless_example_script<br />> I live in /var/lib/systemimager/scripts/post-install.<br />> See: /var/lib/systemimager/scripts/post-install/README for details.<br />> >>> 90test2.add_nfs<br />> umount /a/sys || mount -no remount,ro /a//sys || shellout<br />> umount /a/proc || mount -no remount,ro /a//proc || shellout<br />> umount /a/boot || mount -no remount,ro /a//boot || shellout<br />> umount /a/ || mount -no remount,ro /a// || shellout<br />> umount: /a: target is busy.<br />> (In some cases useful info about processes that use<br />> the device is found by lsof(8) or fuser(1))<br />> Imaging completed<br />> ><br />> > On January 8, 2015 at 8:32 AM LAHAYE Olivier <oli...@ce...> wrote:<br />> ><br />> ><br />> ><br />> > Ho, ok,<br />> ><br />> > then, can you send me the log of the install of a node. You can save it to<br />> > a file from the deployment monitor. Just double click on the host during<br />> > its imaging and when it is finished, use save from the drop down menu.<br />> ><br />> > Olivier.<br />> ><br />> ><br />> ><br />> ><br />> ><br />> ><br />> > --<br />> > Olivier LAHAYE<br />> > CEA DRT/LIST/DIR<br />> ><br />> ><br />> ><br />> ><br />> ><br />> > De : an...@co... [an...@co...]<br />> > Envoyé : jeudi 8 janvier 2015 14:14<br />> > À : LAHAYE Olivier; osc...@li...<br />> > Objet : Re: [Oscar-users] stuck...<br />> ><br />> ><br />> ><br />> ><br />> > I'm running 6.6<br />> ><br />> > /\ndy<br />> ><br />> ><br />> > ----- Reply message -----<br />> > From: "LAHAYE Olivier" <oli...@ce...><br />> > To: "- -" <an...@co...>, "oscar-users"<br />> > <osc...@li...><br />> > Subject: [Oscar-users] stuck...<br />> > Date: Thu, Jan 8, 2015 2:48 AM<br />> ><br />> ><br />> ><br />> > Hi,<br />> ><br />> > If you're running CentOS7, make sure you're using the grub2 post-install<br />> > script: 14all.grub2_install<br />> > 15all.grub_install is for centos6.<br />> ><br />> > Cheers,<br />> ><br />> > Olivier.<br />> ><br />> ><br />> ><br />> ><br />> ><br />> ><br />> > --<br />> > Olivier LAHAYE<br />> > CEA DRT/LIST/DIR<br />> ><br />> ><br />> ><br />> ><br />> ><br />> > De : - - [an...@co...]<br />> > Envoyé : jeudi 8 janvier 2015 04:34<br />> > À : oscar-users; LAHAYE Olivier<br />> > Objet : Re: [Oscar-users] stuck...<br />> ><br />> ><br />> ><br />> > Hi Olivier,<br />> ><br />> > Well, I was able to get the system to build at least but I can't get it to<br />> > boot afterwards. It doesn't make it to the bootloader and I did implement<br />> > your S15 fix script that installs the grub bootloader but still no go. Any<br />> > idea what else I could look into? I'm not sure on this issue.<br /></p> </blockquote> </blockquote> <p><br /> </p> </body></html> |
From: LAHAYE O. <oli...@ce...> - 2015-03-03 19:28:34
|
the ganglia package in the unstable iscar had problem displaying RRD graphs. This should be fixed by this updated version. Note: on CentOS 7 at least and maybe other distros, the apache configuration is not working (access denied) This is due to the fact that the security compat module is not loaded anymore in apache 2.4+ I'm working on the opkg to generate correct config file with 2.4 syntax. Best regards. -- Olivier LAHAYE CEA DRT/LIST/DIR |
From: Jan H. <jan...@iw...> - 2015-03-03 10:45:23
|
Ok, I'm stuck here, getting still the same error when I do oscar-config --bootstrap ... ====> in Database::update_table SQL: UPDATE Clusters SET server_distribution='rhel', headnode_interface='enp2s0f0', installation_date=NOW(), server_architecture='x86_64', oscar_version='6.1.2svn03032015', server_distribution_version='7' Checking for database existence of node oscar_server ... The node oscar_server is already in the database Updating the hostname field in the oscar_server node to <joe4.iwm.fraunhofer.de> ... ERROR: Impossible to detect attached networks at /usr/bin/set_node_nics line 241. ERROR: Impossible to store data about the head NIC used for cluster deployment at /usr/bin/create_and_populate_basic_node_info line 146. [ERROR - oscar-config] Failed to bootstrap OSCAR [ERROR - oscar-config] Unable to bootstrap OSCAR You can see he finds the headnode interface and the oscar_server correctly. What is happening in /usr/bin/set_node_nics and what do I need the "surmised networks" for at this place? The code that doesn't give the wanted results is: # find out what we think the attached networks are based on # the network interface settings for this machine my %surmised_networks = (); my $public_network_index = 1; my $private_network_index = 1; foreach my $nic_name ( keys %$ifconfig_nics_ref ) { my $nic_ref = $$ifconfig_nics_ref{$nic_name}; # figure out a reasonable network name my $network_name; if ( exists $$nic_ref{rfc1918} && $$nic_ref{rfc1918} ) { if ( $num_private > 1 ) { $network_name = "private$private_network_index"; $private_network_index++; } else { $network_name = "private"; } } else { if ( $num_public > 1 ) { $network_name = "public$public_network_index"; $public_network_index++; } else { $network_name = "public"; } } # set up the command to create this network my %network = ( 'name' => $network_name, 'base_ip' => $$nic_ref{base_ip}, 'high_ip' => $$nic_ref{high_ip}, 'netmask' => $$nic_ref{netmask}, 'rfc1918' => $$nic_ref{rfc1918} ); $network{ broadcast } = $$nic_ref{ broadcast } if exists $$nic_ref{ broadcast }; $network{ gateway } = $$nic_ref{ gateway } if exists $$nic_ref{ gateway }; $surmised_networks{ $network_name } = \%network; } if (keys(%surmised_networks) == 0) { die "ERROR: Impossible to detect attached networks"; } Can I just skip the an comment the line that gives the error? Thanks for any hints, Jan. On 02/23/2015 06:48 PM, LAHAYE Olivier wrote: > > Ho, it seems that you're using a buggy oda package. > /usr/share/oscar/prereqs/oda/etc/mysql.cfg should point to > mariadb-server for centos- entry. > Thanks to you I4ve seen some typos errors in this file, so I'm > updating the package now. Should be available in less than 30 minutes. > > Could you try doing: > yum update > and check that /usr/share/oscar/prereqs/oda/etc/mysql.cfg is now correct. > > As for the othe issue, it's strange. > > There is a know bug in the /etc/hosts update routing that put > oscar_server and nfs_oscar host aliases on all configured IPs (public > and private which is wrong. It should only put the alias for the > OSCAR_INTERFACE ip. (I need to correct this). > This file is updated more than once, so check that /etc/hosts has > oscar_server and nfs_oscar only set for one unique IP. > > Aside that it should work eventhough the aliases are "bad" (underscore > char is not allowed for hostnames). I'll fix that in a later version. > feel free to add oscar-server and nfs-oscar and pbs-oscar (keeping > oscar_server, nfs_oscar and pbs_oscar for now) > > > -- > Olivier LAHAYE > CEA DRT/LIST/DIR > ------------------------------------------------------------------------ > *De :* Jan Huelsberg [jan...@iw...] > *Envoyé :* lundi 23 février 2015 13:27 > *À :* osc...@li... > *Objet :* Re: [Oscar-users] Installing OSCAR on CentOS 7 > > Sorry for the additional posting... > > There are more confusing messages during the > oscar-config --bootstrap > > ... > [INFO - install_prereq] Following packages will be > installed:*mariadb-galera-server* > Loaded plugins: fastestmirror, langpacks > Determining fastest mirrors > * base: artfiles.org > * epel: be.mirror.eurid.eu > * extras: artfiles.org > * updates: centos.bio.lmu.de > *No package mariadb-galera-server available.* > Error: Nothing to do > *[ERROR - install_prereq] Install failed: No such file or directory* > Database Initialization... > ... > > > > On 02/23/2015 11:35 AM, Jan Huelsberg wrote: >> Hi, >> >> I'm stuck with installing OSCAR on CentOS 7. >> >> oscar-config --bootstrap finishes with: >> >> Checking for database existence of node oscar_server ... >> The node oscar_server is already in the database >> Updating the hostname field in the oscar_server node to <joe4.xxx> ... >> ERROR: Impossible to detect attached networks at >> /usr/bin/set_node_nics line 241. >> ERROR: Impossible to store data about the head NIC used for cluster >> deployment at /usr/bin/create_and_populate_basic_node_info line 146. >> [ERROR - oscar-config] Failed to bootstrap OSCAR >> [ERROR - oscar-config] Unable to bootstrap OSCAR >> >> What is going wrong? >> >> Additional information: >> Two NICs, one for the local, private cluster network which I used in >> oscar.conf as OSCAR_NETWORK_INTERFACE. >> Another NIC for communication with the internet. >> >> Dowload of packages worked fine, the system seems to know what to use. >> >> /etc/hosts: >> >> ... >> 10.10.11.245 joe4d joe4 joe4.xxx oscar_server >> nfs_oscar pbs_oscar >> ... >> >> Greetings, >> Jan. >> > > -- > > Jan Huelsberg > IT-Manager/Rechenzentrum > Fraunhofer-Institut fuer Werkstoffmechanik IWM > Woehlerstr. 11 > 79108 Freiburg > Telefon +49 761 5142-275 > Fax +49 761 5142-110 > jan...@iw... > www.iwm.fraunhofer.de > > > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk > > > _______________________________________________ > Oscar-users mailing list > Osc...@li... > https://lists.sourceforge.net/lists/listinfo/oscar-users -- Jan Huelsberg IT-Manager/Rechenzentrum Fraunhofer-Institut fuer Werkstoffmechanik IWM Woehlerstr. 11 79108 Freiburg Telefon +49 761 5142-275 Fax +49 761 5142-110 jan...@iw... www.iwm.fraunhofer.de |
From: Hülsberg, J. <jan...@iw...> - 2015-02-28 10:37:51
|
Hi Oliver, don't worry about the late answer. Yes, for the nodes I do need some IP mapping (in the past I used shorewall for this purpose on the head node). But I'm not able to check anything because I'm still stuck with the error I described: oscar-config --bootstrap finishes with: Checking for database existence of node oscar_server ... The node oscar_server is already in the database Updating the hostname field in the oscar_server node to <joe4.xxx> ... ERROR: Impossible to detect attached networks at /usr/bin/set_node_nics line 241. ERROR: Impossible to store data about the head NIC used for cluster deployment at /usr/bin/create_and_populate_basic_node_info line 146. [ERROR - oscar-config] Failed to bootstrap OSCAR [ERROR - oscar-config] Unable to bootstrap OSCAR Jan. ________________________________ Von: LAHAYE Olivier [oli...@ce...] Gesendet: Freitag, 27. Februar 2015 17:36 Bis: osc...@li... Betreff: Re: [Oscar-users] Installing OSCAR on CentOS 7 hi, sorry for late answer. You only need a proxy server if your nodes can't reach internet without it. squid is ncecessary for yum to work on nodes that are on a private network. If you don't need squid, don't check squid in the select oscar package step. Best regards. Olivier. -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________ De : Jan Huelsberg [jan...@iw...] Envoyé : mardi 24 février 2015 11:06 À : osc...@li... Objet : Re: [Oscar-users] Installing OSCAR on CentOS 7 I see that squid.x86_64 is installed but not opkg-squid-server.noarch. I don't think I need a proxy-server. Do I? On 02/23/2015 07:00 PM, LAHAYE Olivier wrote: Note, if you're using squid opkg, you should also do a yum reinstall opkg-squid-server which had a small fix (simming '\' in squid.conf -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________ De : LAHAYE Olivier Envoyé : lundi 23 février 2015 18:48 À : osc...@li...<mailto:osc...@li...> Objet : [PROVENANCE INTERNET] Re: [Oscar-users] Installing OSCAR on CentOS 7 Ho, it seems that you're using a buggy oda package. /usr/share/oscar/prereqs/oda/etc/mysql.cfg should point to mariadb-server for centos- entry. Thanks to you I4ve seen some typos errors in this file, so I'm updating the package now. Should be available in less than 30 minutes. Could you try doing: yum update and check that /usr/share/oscar/prereqs/oda/etc/mysql.cfg is now correct. As for the othe issue, it's strange. There is a know bug in the /etc/hosts update routing that put oscar_server and nfs_oscar host aliases on all configured IPs (public and private which is wrong. It should only put the alias for the OSCAR_INTERFACE ip. (I need to correct this). This file is updated more than once, so check that /etc/hosts has oscar_server and nfs_oscar only set for one unique IP. Aside that it should work eventhough the aliases are "bad" (underscore char is not allowed for hostnames). I'll fix that in a later version. feel free to add oscar-server and nfs-oscar and pbs-oscar (keeping oscar_server, nfs_oscar and pbs_oscar for now) -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________ De : Jan Huelsberg [jan...@iw...<mailto:jan...@iw...>] Envoyé : lundi 23 février 2015 13:27 À : osc...@li...<mailto:osc...@li...> Objet : Re: [Oscar-users] Installing OSCAR on CentOS 7 Sorry for the additional posting... There are more confusing messages during the oscar-config --bootstrap ... [INFO - install_prereq] Following packages will be installed: mariadb-galera-server Loaded plugins: fastestmirror, langpacks Determining fastest mirrors * base: artfiles.org * epel: be.mirror.eurid.eu * extras: artfiles.org * updates: centos.bio.lmu.de No package mariadb-galera-server available. Error: Nothing to do [ERROR - install_prereq] Install failed: No such file or directory Database Initialization... ... On 02/23/2015 11:35 AM, Jan Huelsberg wrote: Hi, I'm stuck with installing OSCAR on CentOS 7. oscar-config --bootstrap finishes with: Checking for database existence of node oscar_server ... The node oscar_server is already in the database Updating the hostname field in the oscar_server node to <joe4.xxx> ... ERROR: Impossible to detect attached networks at /usr/bin/set_node_nics line 241. ERROR: Impossible to store data about the head NIC used for cluster deployment at /usr/bin/create_and_populate_basic_node_info line 146. [ERROR - oscar-config] Failed to bootstrap OSCAR [ERROR - oscar-config] Unable to bootstrap OSCAR What is going wrong? Additional information: Two NICs, one for the local, private cluster network which I used in oscar.conf as OSCAR_NETWORK_INTERFACE. Another NIC for communication with the internet. Dowload of packages worked fine, the system seems to know what to use. /etc/hosts: ... 10.10.11.245 joe4d joe4 joe4.xxx oscar_server nfs_oscar pbs_oscar ... Greetings, Jan. -- Jan Huelsberg IT-Manager/Rechenzentrum Fraunhofer-Institut fuer Werkstoffmechanik IWM Woehlerstr. 11 79108 Freiburg Telefon +49 761 5142-275 Fax +49 761 5142-110 jan...@iw...<mailto:jan...@iw...> www.iwm.fraunhofer.de<http://www.iwm.fraunhofer.de> ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk _______________________________________________ Oscar-users mailing list Osc...@li...<mailto:Osc...@li...> https://lists.sourceforge.net/lists/listinfo/oscar-users -- Jan Huelsberg IT-Manager/Rechenzentrum Fraunhofer-Institut fuer Werkstoffmechanik IWM Woehlerstr. 11 79108 Freiburg Telefon +49 761 5142-275 Fax +49 761 5142-110 jan...@iw...<mailto:jan...@iw...> www.iwm.fraunhofer.de<http://www.iwm.fraunhofer.de> |