Screenshot instructions:
Windows
Mac
Red Hat Linux
Ubuntu
Click URL instructions:
Right-click on ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)
You can subscribe to this list here.
2002 |
Jan
(6) |
Feb
(7) |
Mar
(26) |
Apr
(84) |
May
(60) |
Jun
(35) |
Jul
(72) |
Aug
(30) |
Sep
(16) |
Oct
(94) |
Nov
(53) |
Dec
(39) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(53) |
Feb
(39) |
Mar
(56) |
Apr
(44) |
May
(37) |
Jun
(83) |
Jul
(32) |
Aug
(42) |
Sep
(41) |
Oct
(41) |
Nov
(41) |
Dec
(42) |
2004 |
Jan
(43) |
Feb
(31) |
Mar
(53) |
Apr
(50) |
May
(34) |
Jun
(50) |
Jul
(13) |
Aug
(20) |
Sep
(48) |
Oct
(6) |
Nov
(40) |
Dec
(22) |
2005 |
Jan
(43) |
Feb
(69) |
Mar
(41) |
Apr
(34) |
May
(36) |
Jun
(50) |
Jul
(40) |
Aug
(64) |
Sep
(47) |
Oct
(52) |
Nov
(64) |
Dec
(50) |
2006 |
Jan
(100) |
Feb
(74) |
Mar
(95) |
Apr
(64) |
May
(81) |
Jun
(56) |
Jul
(35) |
Aug
(52) |
Sep
(43) |
Oct
(45) |
Nov
(50) |
Dec
(45) |
2007 |
Jan
(71) |
Feb
(16) |
Mar
(49) |
Apr
(45) |
May
(31) |
Jun
(29) |
Jul
(77) |
Aug
(32) |
Sep
(83) |
Oct
(82) |
Nov
(68) |
Dec
(47) |
2008 |
Jan
(65) |
Feb
(78) |
Mar
(98) |
Apr
(97) |
May
(72) |
Jun
(133) |
Jul
(92) |
Aug
(140) |
Sep
(95) |
Oct
(85) |
Nov
(107) |
Dec
(27) |
2009 |
Jan
(42) |
Feb
(23) |
Mar
(36) |
Apr
(24) |
May
(76) |
Jun
(51) |
Jul
(86) |
Aug
(71) |
Sep
(82) |
Oct
(88) |
Nov
(136) |
Dec
(74) |
2010 |
Jan
(64) |
Feb
(67) |
Mar
(63) |
Apr
(52) |
May
(65) |
Jun
(105) |
Jul
(72) |
Aug
(52) |
Sep
(77) |
Oct
(121) |
Nov
(116) |
Dec
(83) |
2011 |
Jan
(56) |
Feb
(33) |
Mar
(145) |
Apr
(98) |
May
(111) |
Jun
(99) |
Jul
(61) |
Aug
(49) |
Sep
(42) |
Oct
(79) |
Nov
(55) |
Dec
(78) |
2012 |
Jan
(18) |
Feb
(100) |
Mar
(81) |
Apr
(41) |
May
(93) |
Jun
(46) |
Jul
(90) |
Aug
(64) |
Sep
(59) |
Oct
(131) |
Nov
(31) |
Dec
(39) |
2013 |
Jan
(29) |
Feb
(46) |
Mar
(47) |
Apr
(22) |
May
(32) |
Jun
(41) |
Jul
(67) |
Aug
(44) |
Sep
(41) |
Oct
(39) |
Nov
(38) |
Dec
(33) |
2014 |
Jan
(40) |
Feb
(37) |
Mar
(142) |
Apr
(43) |
May
(26) |
Jun
(14) |
Jul
(26) |
Aug
(40) |
Sep
(22) |
Oct
(22) |
Nov
(26) |
Dec
(28) |
2015 |
Jan
(17) |
Feb
(50) |
Mar
(40) |
Apr
(15) |
May
(23) |
Jun
(33) |
Jul
(8) |
Aug
(21) |
Sep
(20) |
Oct
(19) |
Nov
(25) |
Dec
(18) |
2016 |
Jan
(19) |
Feb
(14) |
Mar
(11) |
Apr
(37) |
May
(6) |
Jun
(9) |
Jul
(3) |
Aug
(7) |
Sep
(6) |
Oct
(12) |
Nov
(2) |
Dec
(7) |
2017 |
Jan
|
Feb
(11) |
Mar
(14) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(3) |
Aug
(7) |
Sep
(5) |
Oct
|
Nov
(1) |
Dec
|
2018 |
Jan
|
Feb
(2) |
Mar
(4) |
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
1
|
2
(3) |
3
(3) |
4
|
5
|
6
|
7
|
8
(5) |
9
(5) |
10
(2) |
11
(2) |
12
|
13
|
14
|
15
(1) |
16
|
17
|
18
|
19
|
20
|
21
(1) |
22
|
23
|
24
|
25
(1) |
26
|
27
|
28
(1) |
29
(1) |
30
(4) |
31
(3) |
|
|
From: Hunt, Derek <derek.hunt@ya...> - 2003-07-31 18:33:47
|
Hello all, I am encountering some rahte stange compiling issues under OS X Server (10.2.6). Making all in gmetad source=3D'rrd_helpers.c' object=3D'rrd_helpers.o' libtool=3Dno \ depfile=3D'.deps/rrd_helpers.Po' tmpdepfile=3D'.deps/rrd_helpers.TPo' \ depmode=3Dgcc /bin/sh ../config/depcomp \ gcc -DHAVE_CONFIG_H -I. -I. -I.. -I/sw/include -I../lib -I../gmond -I/sw/include -Wall -c `test -f 'rrd_helpers.c' || echo './'`rrd_helpers.c In file included from /usr/include/mach/host_info.h:65, from /usr/include/mach/mach_types.h:66, from /usr/include/pthread.h:44, from ../lib/ganglia/net.h:19, from ./gmetad.h:3, from rrd_helpers.c:9: /usr/include/mach/time_value.h:62: redefinition of `struct time_value' rrd_helpers.c: In function `my_mkdir': rrd_helpers.c:25: warning: implicit declaration of function `err_sys' rrd_helpers.c: In function `RRD_update': rrd_helpers.c:57: warning: implicit declaration of function `err_msg' rrd_helpers.c:61: warning: implicit declaration of function `debug_msg' make[2]: *** [rrd_helpers.o] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2 [root@... ganglia-monitor-core-2.5.3]# gcc -v Reading specs from /usr/libexec/gcc/darwin/ppc/3.1/specs Thread model: posix Apple Computer, Inc. GCC version 1175, based on gcc version 3.1 20020420 (prerelease) [root@... ganglia-monitor-core-2.5.3]# I have the Dec 2002 Patch applied, and I have tried compiling it under gcc 3.3 and gcc 3.1. I also have the libpoll installed via fink. Basically it goes through and compiles fine (without enabling gmetad). With gmetad enabled it dies with the above error.=20 FYI, the ganglia snapshot of 2.5.4 gives the same error. Has anyone compiled gmetad under OS X successfully? Thanks - Derek |
From: steven wagner <swagner@il...> - 2003-07-31 16:26:47
|
Dave Bradshaw wrote: > Dear Steve, > > Thanks for the advice. I have done what you sugessted. I have also turned off one of the gmetad daemons so it is now only running on the machine with the web frontend. > Now when I fire up the web frontend I get the following message: - > > Ganglia cannot find a data source. Is gmond running? > > Any suggestions? My only suggestion is Boilerplate Ganglia Troubleshooting Suggestion #3: When something doesn't work, do what the daemon would be doing. In this case, the metadaemon is trying to connect to the monitoring core specified in /etc/gmetad.conf. So, log in to that box and crank up a telnet window. What do you see? Connection refused? The connection gets accepted and then dropped? The connection gets accepted and you get a DTD but no data? You get the DTD and *malformed* data? The answer will help tell you where you should aim in future shooting of trouble. If the connection is refused out of hand, then either the monitoring core is not running, it's listening on the wrong port, it's configured as deaf, or something else is standing in the way of that connection being established. If the connection gets accepted and then dropped, it's a trusted_hosts issue. Make sure you're feeding that monitoring core the right IP... check the debug output for more info. This leads to Boilerplate Ganglia Troubleshooting Suggestion #2: When there's something strange and it don't look good, turn on debug mode in whatever's giving you problems. Combing through pages of debug output may not be as much fun as pizza and margarita shooters, but you can usually figure out what's going wrong based on that. I'll let you guess what Boilerplate Ganglia Troubleshooting Suggestion #1 is. :) Good luck! |
From: Dave Bradshaw <david-b@mo...> - 2003-07-31 09:55:43
|
Dear Steve, Thanks for the advice. I have done what you sugessted. I have also turned off one of the gmetad daemons so it is now only running on the machine with the web frontend. Now when I fire up the web frontend I get the following message: - Ganglia cannot find a data source. Is gmond running? I have then on both the machines running gmond only, added this line to the gmond.conf file:- trusted_hosts 172.16.15.5 This being the IP address of the machine running the web frontend. Then restarted the gmond daemon. Then on the machine running the web frontend in gmond.conf I have put this line:- trusted_hosts 172.16.200.29 172.16.11.136 The IP address of the above two machines. Then restarted the gmond daemon. Then also in gmetad.conf the same line. Then restarted the gmtead daemon. I still then get the same message. Ganglia cannot find a data source. Is gmond running? Any suggestions? On Wed, 30 Jul 2003 10:43:20 -0700 steven wagner <swagner@...> wrote: > Dave Bradshaw wrote: > > Dear All, > > > > Am I an idiot? > > My rule of thumb is not to ask this sort of question on a list unless > I'm asking something already covered in the docs. There's always a > chance some wisecracker out there will answer it. > > :) > > > Where am I going wrong? > > Different clusters need to transmit on different multicast IPs or ports. > Otherwise, they will all hear each other's metrics. > > So cluster A can use 232.4.6.11:8649, cluster B can use 232.4.6.12:8649, > cluster C can use 232.4.6.12:8648 ... none of them should overlap, in > that case. > > The cluster name has no real effect on the way the monitoring core > gathers metrics. The metadaemon may use it (in comparison to the source > name in /etc/gmetad.conf), but I don't remember what the logic there > looks like. > > All the metadaemon does is query a monitoring core per data_source line > in the config, parse the results, store them as RRDs and merge all its > data sources into a single XML document for the web front-end to query. > > Okay, so it does a lot, but my point is that the problem you're having > is upstream. > > Only one host (or, in very large installs, a couple hosts) need to run > gmetad. The others just need to run the monitoring core. > > Good luck! > |
From: steven wagner <swagner@il...> - 2003-07-30 17:43:25
|
Dave Bradshaw wrote: > Dear All, > > Am I an idiot? My rule of thumb is not to ask this sort of question on a list unless I'm asking something already covered in the docs. There's always a chance some wisecracker out there will answer it. :) > Where am I going wrong? Different clusters need to transmit on different multicast IPs or ports. Otherwise, they will all hear each other's metrics. So cluster A can use 232.4.6.11:8649, cluster B can use 232.4.6.12:8649, cluster C can use 232.4.6.12:8648 ... none of them should overlap, in that case. The cluster name has no real effect on the way the monitoring core gathers metrics. The metadaemon may use it (in comparison to the source name in /etc/gmetad.conf), but I don't remember what the logic there looks like. All the metadaemon does is query a monitoring core per data_source line in the config, parse the results, store them as RRDs and merge all its data sources into a single XML document for the web front-end to query. Okay, so it does a lot, but my point is that the problem you're having is upstream. Only one host (or, in very large installs, a couple hosts) need to run gmetad. The others just need to run the monitoring core. Good luck! |
From: Michael Robokoff <mrobo@ah...> - 2003-07-30 17:22:25
|
I believe they are sorted by the metric selected. Metric <load> Sorted <decending> Changing those will change the layout. --Mike Brent M. Clements wrote: > How do I organize the hosts that are displayed on the web frontend? > > IE, I'd like the host graphs organized incrementaly by default. > > Ie > > n1 n2 n3 > n4 n5 n6 > n7 n8 n9 > n10 n11 n12 > > right now, the order appears random. > > Thanks, > > > Brent Clements > HPC Technology Specialist > Information Technology > Rice University > > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > Data Reports, E-commerce, Portals, and Forums are available now. > Download today and enter to win an XBOX or Visual Studio .NET. > http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 > _______________________________________________ > Ganglia-general mailing list > Ganglia-general@... > https://lists.sourceforge.net/lists/listinfo/ganglia-general |
From: Brent M. Clements <bclem@ri...> - 2003-07-30 17:04:22
|
How do I organize the hosts that are displayed on the web frontend? IE, I'd like the host graphs organized incrementaly by default. Ie n1 n2 n3 n4 n5 n6 n7 n8 n9 n10 n11 n12 right now, the order appears random. Thanks, Brent Clements HPC Technology Specialist Information Technology Rice University |
From: Dave Bradshaw <david-b@mo...> - 2003-07-30 16:38:42
|
Dear All, Am I an idiot? Where am I going wrong? I would like to use Ganglia to monitor various machines at work. The machines I would like to group together into various clusters. Just like other people have done. So for simplicity I have three machines and I want two machines to be in one cluster and the other in a separate cluster So I have installed the Linux gmond rpm on each machine. I have then edit the /etc/gmond.conf file and put in this entry: - For machine A - name "Systems Machines" For machine B - name "Systems Machines" For machine C - name "Tower Machines" Every other entry in this file is left to the defaults. I have then restarted the gmond daemon on all three machines. On machine A I have also installed the gmetad rpm. I have edited this file and put in the following lines: - data_source "System Machines" 127.0.0.1:8649 172.16.11.136:8649 (172.16.11.136 being the IP address of machine B) Then I restarted gmetad. Running gstat -a I get this output: - CLUSTER INFORMATION Name: System Machines Hosts: 2 Gexec Hosts: 0 Dead Hosts: 0 Localtime: Wed Jul 30 17:16:50 2003 CLUSTER HOSTS Hostname LOAD CPU Gexec CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System, Idle] wingnut.mpc.local 1 ( 2/ 180) [ 0.05, 0.06, 0.03] [ 1.2, 0.0, 0.5, 98.4] OFF bromley.mpc.local 0 ( 0/ 0) [ 0.00, 0.00, 0.00] [ 0.0, 0.0, 0.0, 0.0] OFF Then on machine C I also installed the gmetad rpm, the webfrontend, the rrd package, apache. I have also edited the gmetad.conf file on machine C and I have put in these lines: - data_source "Tower Machines" localhost data_source "Systems Machines" localhost gridname "MPC" Running gstat -a on machine I get this information: - CLUSTER INFORMATION Name: Tower Machines Hosts: 3 Gexec Hosts: 0 Dead Hosts: 0 Localtime: Wed Jul 30 17:26:25 2003 CLUSTER HOSTS Hostname LOAD CPU Gexec CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System, Idle] wingnut.mpc.local 0 ( 0/ 0) [ 0.00, 0.00, 0.00] [ 0.7, 0.0, 0.6, 99.2] OFF askja.mpc.local 1 ( 1/ 190) [ 1.04, 1.11, 0.71] [ 5.1, 0.0, 2.5, 92.5] OFF bromley.mpc.local 0 ( 0/ 0) [ 0.06, 0.00, 0.00] [ 5.1, 0.0, 0.8, 94.0] OFF When I fire up the web browser at point it at machineC/gangliaFrontend I see all three machines under Tower Machines > -- Choose a node Not what I was hoping for. If I change the following line in machine C's gmetad.conf file data_source "Systems Machines" localhost to data_source "Systems Machines" 172.16.200.29 (172.16.200.29 the IP address of machine A also running gmetad) nothing changes. Followed by restarting gmetad. Nothing changes Where am I going wrong? Thanks for any help offered. Regards, Dave. |
From: Maica Asperez <maiasperez@ho...> - 2003-07-29 09:26:48
|
Hi, I'm working gmetric in Tru64 v5.1A. When i run 'gmetric --help', i get a message with the commands that i can use. I'm runing gmond in the port 8659 and i run of the following form: gmetric --name "temperature" --value "63" --type int16 -p8659 Thanks >From: Simon Dodsley <simond@...> >To: Maica Asperez <maiasperez@...> >Subject: Re: [Ganglia-general] Problem with gmetric >Date: 29 Jul 2003 09:40:39 +0000 > >Hi, > >I have gmetric working fine on Tru64 v5.1A and v5.1B > >When you run gmetric --help do you get the usage message? > >Also have you changed the multicast port or channel from the default >values as they have to be sent in the gmetric command if you have? > >Simon > > >On Mon, 2003-07-28 at 12:25, Maica Asperez wrote: > > > > Hello all, > > I want to use the gmetric in Tru64, but it doesn't work. I have >installed > > gmond 2.5.3 like client. > > Gmetric doesn't print any error. If i run gmond with the option debug >, it > > doesn't show any sinal that it receives the data. Nevertheless, if i do >it > > in RedHat 7.2 with gmond server, this prints: > > User-defined data: type=string name=temperature val=ls (3) units= > > > > Thank you. > > > > _________________________________________________________________ > > Multiplica por cinco el tamaño de tu buzón de correo y envÃa adjuntos de > > hasta 2 Mb con MSN Almacenamiento Extra. >http://join.msn.com/?pgmarket=es-es > > > > > > > > ------------------------------------------------------- > > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > > Data Reports, E-commerce, Portals, and Forums are available now. > > Download today and enter to win an XBOX or Visual Studio .NET. > > >http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 > > _______________________________________________ > > Ganglia-general mailing list > > Ganglia-general@... > > https://lists.sourceforge.net/lists/listinfo/ganglia-general > > _________________________________________________________________ Dale vida a tu correo. Con MSN 8 podrás incluir fotos y textos increibles. http://join.msn.com/?pgmarket=es-es&XAPID=517&DI=1055 |
From: Maica Asperez <maiasperez@ho...> - 2003-07-28 12:25:23
|
Hello all, I want to use the gmetric in Tru64, but it doesn't work. I have installed gmond 2.5.3 like client. Gmetric doesn't print any error. If i run gmond with the option debug , it doesn't show any sinal that it receives the data. Nevertheless, if i do it in RedHat 7.2 with gmond server, this prints: User-defined data: type=string name=temperature val=ls (3) units= Thank you. _________________________________________________________________ Multiplica por cinco el tamaño de tu buzón de correo y envÃa adjuntos de hasta 2 Mb con MSN Almacenamiento Extra. http://join.msn.com/?pgmarket=es-es |
From: Lester Vecsey <bliptune@op...> - 2003-07-25 00:03:11
|
Theres a mem_shared_func in machine.c when compiling ganglia 2.5.3 on linux.. and it looks for the text 'MemShared:' inside of /proc/meminfo But it turns out that phrase no longer exists in the newer linux kernels like Linux 2.6.0-test1 gmond would segfault because the skip_token function would get to the end and never matched the phrase, so I just set val.uint32 = 0 in that function for that metric for now to get gmond going again. I did a brief search on linux and MemShared and found this text at groups.google.com, -- remove-memshared.patch Remove /proc/meminfo:MemShared -- Here is a list of current values that show up with that newer kernel, considerably more than in 2.4.x series production kernels -- cat /proc/meminfo MemTotal: 904660 kB MemFree: 107992 kB Buffers: 60724 kB Cached: 633736 kB SwapCached: 0 kB Active: 286952 kB Inactive: 451448 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 904660 kB LowFree: 107992 kB SwapTotal: 2048248 kB SwapFree: 2048248 kB Dirty: 164 kB Writeback: 0 kB Mapped: 48608 kB Slab: 48784 kB Committed_AS: 238212 kB PageTables: 1648 kB VmallocTotal: 122824 kB VmallocUsed: 1448 kB VmallocChunk: 121316 kB |
From: Peter Schmid <schmid@cr...> - 2003-07-21 16:57:23
|
I've got two issues with compiled Ganglia from source. I've got the RPMs for Linux installed and configured working GREAT!. These are the two problems outlined below. In both cases I exploded the tar ball, set path to the GCC compiler before the verndor compilers, ran configure, make and it ended with the following: Thanks in advance. Pete. SGI Irix compile problems of 2.5.3: gcc -I../lib -I../lib/dnet -g -O2 -Wall -D_IRIX_SOURCE -o gmond gmond.o monitor.o server.o listen.o cleanup.o machine.o cmdline.o ../lib/.libs/libganglia.a ../lib/libdnet.a ../lib/libgetopthelper.a -ldl -lnsl -lnsl -lsocket -lpthread ld32: WARNING 84 : /usr/lib32/libdl.so is not used for resolving any symbol. ld32: WARNING 84 : /usr/lib32/libsocket.so is not used for resolving any symbol. ld32: ERROR 33 : Unresolved text symbol "mtu_func" -- 1st referenced by gmond.o. Use linker option -v to see when and which objects, archives and dsos are loaded. ld32: INFO 152: Output file removed because of error. HP 11.0 problem with 2.5.3: gcc -I../lib -I../lib/dnet -g -O2 -Wall -D_HPUX_SOURCE -o gmond gmond.o monitor.o server.o listen.o cleanup.o machine.o cmdline.o ../lib/.libs/libganglia.a ../lib/libdnet.a ../lib/libgetopthelper.a -lpthread -lnsl /usr/bin/ld: Unsatisfied symbols: inet_pton (first referenced in ../lib/libdnet.a(intf.o)) (code) collect2: ld returned 1 exit status make[3]: *** [gmond] Error 1 make[3]: Leaving directory `/no_backup/system/ganglia-monitor-core-2.5.3/gmond' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/no_backup/system/ganglia-monitor-core-2.5.3/gmond' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/no_backup/system/ganglia-monitor-core-2.5.3' make: *** [all] Error 2 -- Peter Schmid Logic Technology Inc. GE Global Research Center ITMS Engineering Systems Group (518) 387-6903 |
From: Koushik <koushik_iit@re...> - 2003-07-15 22:53:08
|
I m a new user, i can not find, "how to compile and run mpi codes using ganglia ?" if anyone knows the answer, please mail me asap. i'd prefer an example too. thanking u. -koushik ___________________________________________________ Click below to experience Sooraj R Barjatya's latest offering 'Main Prem Ki Diwani Hoon' starring Hrithik, Abhishek & Kareena http://www.mpkdh.com |
From: Kevin James Flasch <kflasch@uw...> - 2003-07-11 21:28:56
|
Thanks so much for your help, Steven. We fixed the problem - it turned out that there was a configuration option in our switch's software that was munging up multicast traffic. Ganglia is working fine now. Thanks a lot! Kevin Flasch On Wed, 2 Jul 2003, steven wagner wrote: > Kevin James Flasch wrote: > >>* Check some of the gmond-only nodes' XML port output. How many nodes > >>do they see? Do they see 289-295 nodes or just their own output? > > > > > > I believe you're referring to the mcast_port (by default 8649). When I telnet > > to it, I see what appears to be all/most of them. > > (`telnet localhost 8649 | grep "<HOST " | wc -l` gives me 300). > > wc -l's a good start but you should actually check each host's timestamp > value. If the timestamps are fairly close to one another and close to > NOW(), then you know that the monitoring core you're polling is > receiving packets from all 300 hosts often enough for them to be > considered "up" - the REPORTED attribute is updated every time any > metric is received from a given host. > > > They are not all in the same subnet. There are two subnets that they reside > > in. > > > > They are all physically connected to the same switch. > > > > There is no firewalling of the sort that blocks ports, drops packets on the > > master. The idea that there is something wrong with the network connection > > seems reasonable. I can't see anything outstanding about it, however, and > > there have been no network problems with the connection otherwise. > > So far so good... > > >>* Consider polling a different set of monitoring cores as your gmetad > >>cluster data source. > > > > > > I'm not sure I follow. Can you explain or give an example, please? > > Sure. gmetad has a configuration file, /etc/gmetad.conf by default, > that specifies data sources. gmetad considers each of these data > sources to be a different cluster. You can specify a polling frequency > and a list of IP(:port) combos for each cluster. These will be checked > from left to right. > > Example: > > data_source mycluster 15 10.0.0.2 10.0.0.3:2463 10.0.0.4 10.0.0.5 > data_source anothercluster 60 192.168.7.15 > > In order to debug gmetad, it helps to "see what the killer sees" by > telnetting to each of these sources in the same order from the node > running the metadaemon. This should at least point you at the > misbehaving monitoring core. > > It may well be that the local monitoring core on the front-end is the > one that's misconfigured somehow. > > >>* Run a monitoring core in debug mode. You will see what metrics it's > >>sending and what metrics it's hearing on the multicast channel. > > > > > > Hmm.. I'm not sure what the output of that should look like on node in a > > functioning ganglia environment. It seems like it's communicating somewhat > > with the other nodes, but most of the entries seem to be about itself. One of > > the entries mentioning another machine look like this: > > > > Is that less data than typical? > > > On a 300-node Ganglia cluster you should be seeing at least load average > metrics being multicast from every node every 15-60 seconds, plus the > various other metrics according to their thresholds. Regardless, you > should see more than a packet every few seconds. > > In fact if you didn't find it necessary to redirect the debug output to > a file, you're probably not getting all the packets. :) > > >>* tcpdump. Limit it to just the multicast IP or port and you should be > >>able to get all Ganglia-related traffic that the running host can hear. > > > > > > That's what I did before to check the frequency of ganglia traffic. Most of > > the traffic is the machine itself broadcasting 8 byte (ocassionally 12 byte) > > udp packets on the multicast channel. Once and a while an 8 byte udp packet > > from another node will come on the multicast channel (after every 5-15 > > originating packets on the multicast channel). > > See above, you should be getting them more than once in a while. It > would be interesting to check two monitoring cores to see if they're > receiving one another's packets, what the ratio is of dropped packets to > total packets sent, and if any of the packets that make it through have > anything in common with one another. Might give you some clues if > nothing else does. > > >>I know, it's not much, but it's something. > > > > > > Thanks so much for your help. I suppose this only makes me think that there is > > some networking issue, hardware or software, but I have no idea what it is at > > this point. > > Well, the only thing harder than troubleshooting your own hardware is > troubleshooting someone else's. :) > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > Data Reports, E-commerce, Portals, and Forums are available now. > Download today and enter to win an XBOX or Visual Studio .NET. > http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 > _______________________________________________ > Ganglia-general mailing list > Ganglia-general@... > https://lists.sourceforge.net/lists/listinfo/ganglia-general > > |
From: Demetri Mouratis <dmourati@cm...> - 2003-07-11 16:02:24
|
First off, thanks to the Ganglia developers for a great monitoring tool. I've learned so much about my network using Ganglia. Now to my problem. I recently physically moved my network from one data center to another. During the course of the move, I brought up one of my monitored machines with an incorrect IP address (off by one in the fourth octet). Gmond was running on that machine during the brief time it had the wrong address, perhaps 15 minutes. Once the IP address was corrected, the box was being correctly monitored by Ganglia. The problem is that the old IP appears as a dead host in the "Snapshot" with the skull and crossbones icon. I'd like to remove it. Since there is no host running on that IP any longer, there is of course no gmond running there. I've waited a few days, restarted gmetad on the web front end many times, restarted gmond on the host with the new IP address, deleted the corresponding rrds, all to no avail. Specifics: netmonitor1 is the web front end netmonitor2 is the box with the IP mixup netmonitor1#uname -a Linux netmonitor1 2.4.20-18.7smp #1 SMP Thu May 29 07:49:23 EDT 2003 i686 unknown netmonitor1# gmetad -V ganglia-monitor-core 2.5.3 netmonitor2# uname -a Linux netmonitor2 2.4.20-18.7smp #1 SMP Thu May 29 07:49:23 EDT 2003 i686 unknown netmonitor2# gmond -V ganglia-monitor-core 2.5.3 TIA --------------------------------------------------------------------- Demetri Mouratis dmourati@... |
From: Andrei E. Chevel <Andrei.Chevel@pn...> - 2003-07-10 16:47:38
|
Last time (several days ago) when I tied to deploy ganglia on our Alpha 4000 ram0:/Users/shevel<12:18:39> uname -a OSF1 ram0.i2net.sunysb.edu V4.0 1229 alpha I found out that several headers from 'glibc' (they being invoked) are absent (something like inttypes.h ). Does somebody know how to overcome it ? Thanks, > Hector, > > We have gmond running here on a large set of platforms including > Alphaserver SC clusters at V5.1A and V5.1B > > The only glitch that I have that is unresolved is that whatever node in > a cluster I point gmetad at it reports all other node *except* itself?! > > Yours, > Daniel. > > -------------------------------------------------------------- > Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@... > One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 > ----------------------- http://www.quadrics.com -------------------- > > > -----Original Message----- > From: Hector M. Jacas [mailto:jacas@...] > Sent: 08 July 2003 21:04 > To: ganglia-general@... > Subject: [Ganglia-general] Tru64 v5.1A version of gmond > > > Hello to all! > > I am looking for the way to build and to install a version of GMOND for > Tru64 v5.1A. > > Somebody already this has fact? Can somebody help me? > > Thanks a lot, > > Hector M. Jacas > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Parasoft > Error proof Web apps, automate testing & more. > Download & eval WebKing and get a free book. > http://www.parasoft.com/bulletproofapps > _______________________________________________ > Ganglia-general mailing list > Ganglia-general@... > https://lists.sourceforge.net/lists/listinfo/ganglia-general > > > ------------------------------------------------------- > This SF.Net email sponsored by: Parasoft > Error proof Web apps, automate testing & more. > Download & eval WebKing and get a free book. > http://www.parasoft.com/bulletproofapps > _______________________________________________ > Ganglia-general mailing list > Ganglia-general@... > https://lists.sourceforge.net/lists/listinfo/ganglia-general ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Andrei E. Chevel : e-mail: Andrei.Chevel@... Computing Systems Department http://www.pnpi.spb.ru/CSD Petersburg Nuclear Physics Institute http://www.pnpi.spb.ru/ 188300, Gatchina, Leningrad district, Russia Fax: +7(81271)46256,46350 |
From: Simon Dodsley <simond@qu...> - 2003-07-10 10:36:13
|
Hector, It was difficult to compile on Tru64, but here is a list of the things we did that worked eventually for 2.5.3 ------------ Unpack rrd tarball in /tmp, then: # cd /tmp/rrdtool-1.0.42 # sh configure --prefix=/usr/local/rrdtool --enable-shared # make # make install Unpack ganglia-monitor-core in /tmp, then: Edit configure and comment out line CFLAGS="$CFLAGS -Wall" Then: # cd /tmp/ganglia-monitor-core-2.5.3 # ./configure --with-gmetad CFLAGS="-pthread -I/usr/local/rrdtool/include" LDFLAGS=-L/usr/local/rrdtool/lib vi ./gmond/machine.c Line 167 - CTRL-J # make # make install # mkdir -p /var/lib/ganglia/rrds # chown -R nobody /var/lib/ganglia/rrds ------------------------------------- Hope this helps... Regards, Simon Dodsley Quadrics Ltd. On Wed, 2003-07-09 at 16:13, Daniel Kidger wrote: > Hector, > > We have gmond running here on a large set of platforms including Alphaserver > SC clusters at V5.1A and V5.1B > > The only glitch that I have that is unresolved is that whatever node in a > cluster I point gmetad at it reports all other node *except* itself?! > > Yours, > Daniel. > > -------------------------------------------------------------- > Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@... > One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 > ----------------------- http://www.quadrics.com -------------------- > > > -----Original Message----- > From: Hector M. Jacas [mailto:jacas@...] > Sent: 08 July 2003 21:04 > To: ganglia-general@... > Subject: [Ganglia-general] Tru64 v5.1A version of gmond > > > Hello to all! > > I am looking for the way to build and to install a version of GMOND for > Tru64 v5.1A. > > Somebody already this has fact? Can somebody help me? > > Thanks a lot, > > Hector M. Jacas > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Parasoft > Error proof Web apps, automate testing & more. > Download & eval WebKing and get a free book. > http://www.parasoft.com/bulletproofapps > _______________________________________________ > Ganglia-general mailing list > Ganglia-general@... > https://lists.sourceforge.net/lists/listinfo/ganglia-general |
From: Daniel Kidger <Daniel.Kidger@qu...> - 2003-07-09 16:15:01
|
Hector, We have gmond running here on a large set of platforms including Alphaserver SC clusters at V5.1A and V5.1B The only glitch that I have that is unresolved is that whatever node in a cluster I point gmetad at it reports all other node *except* itself?! Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@... One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- http://www.quadrics.com -------------------- -----Original Message----- From: Hector M. Jacas [mailto:jacas@...] Sent: 08 July 2003 21:04 To: ganglia-general@... Subject: [Ganglia-general] Tru64 v5.1A version of gmond Hello to all! I am looking for the way to build and to install a version of GMOND for Tru64 v5.1A. Somebody already this has fact? Can somebody help me? Thanks a lot, Hector M. Jacas ------------------------------------------------------- This SF.Net email sponsored by: Parasoft Error proof Web apps, automate testing & more. Download & eval WebKing and get a free book. http://www.parasoft.com/bulletproofapps _______________________________________________ Ganglia-general mailing list Ganglia-general@... https://lists.sourceforge.net/lists/listinfo/ganglia-general |
From: Michael Robokoff <mrobo@ah...> - 2003-07-09 15:32:59
|
I get the following on my web page. Can some one lead me to the cause? Warning: Wrong datatype in ksort() call in /var/www/html/ganglia-webfrontend/header.php on line 90 Warning: Invalid argument supplied for foreach() in /var/www/html/ganglia-webfrontend/header.php on line 91 Warning: Invalid argument supplied for foreach() in /var/www/html/ganglia-webfrontend/header.php on line 159 Thanks --Mike |
From: Russell Nordquist <rdn@uc...> - 2003-07-09 04:30:30
|
matt thanks for the directions....the one additional thing I needed to do was turn set " no_setuid on" since the ganglia uid couldn't core dump. i'll post the backtrace to the developers list. thanks for the help. no hassle, i'm interested in what at the bottom of this. russell On Tue, 8 Jul 2003 at 17:43, matt massie wrote: > russell- > > another trick for debugging the problem is to examine the core dump. when > you run... > > % ulimit -a > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > file size (blocks, -f) unlimited > max locked memory (kbytes, -l) unlimited > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) 2047 > virtual memory (kbytes, -v) unlimited > > > you'll get your current user resource limitations. if you have the > permissions (e.g. you are root or your admin lets users change their > limits), you can change the core file size to unlimited. > > % ulimit -c unlimited > > this will allow programs to dump a core file when they segfault. that > core file is very helpful. you can use a debugger (gdb) to find exactly > what the program was doing when it crashed. > > for example, say you run the following... > > % ./gmond > ... > <SEGFAULT...CRASH...BOOM...BANG> > > % ls core* > core.12345 > > % gdb --core=./core.12345 ./gmond > GNU gdb Red Hat Linux (5.2.1-4) > Copyright 2002 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you > are welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "i386-redhat-linux". > (gdb) > > you are now inside the debugger... and have a (gdb) prompt. > > (gdb) bt <enter> > > will show a backtrack of exactly what the program was doing when it > croaked. > > (gdb) quit <enter> > > will exit the debugger. > > if you cut and paste a backtrace to the developers list, we will certainly > know what is going on. > > sorry for the hassle. i'm sure will get this problem fixed. > > -matt > > > > Today, Russell Nordquist wrote forth saying... > > > From: Russell Nordquist <rdn@...> > > To: steven wagner <swagner@...> > > Cc: ganglia-general@... > > Date: Tue, 08 Jul 2003 17:21:08 -0500 (CDT) > > Subject: Re: [Ganglia-general] gmond dying > > > > On Tue, 8 Jul 2003 at 14:50, steven wagner wrote: > > > > > I have no specific solutions for you but here are some potentially > > > helpful tidbits which may permit you to shoot your own trouble: > > > > > > Does the monitoring core die right away? > > > Does it dump core? > > > Does it die when you run it in debug mode? > > > Does debug mode tell you anything more about the error? > > > Do other versions of the monitoring core exhibit this behavior? > > > > I turned the debugging up and: > > > > host:~# gmond > > /etc/gmond.conf configuration > > name is Octopod > > owner is unspecified > > latlong is unspecified > > Cluster URL is unspecified > > Host location is (x,y,z): unspecified > > mcast_channel is 239.2.11.71 > > mcast_port is 8649 > > mcast_if is eth1 > > mcast_ttl is 1 > > mcast_threads is 2 > > xml_port is 8649 > > xml_threads is 2 > > trusted hosts are: 128.135.28.150 > > > > num_nodes is 4 > > num_custom_metrics is 16 > > mute is 0 > > deaf is 0 > > debug_level is 10 > > no_setuid is 0 > > setuid is ganglia > > no_gexec is 0 > > all_trusted is 0 > > pthread_attr_init > > creating cluster hash for 4 nodes > > hash_create size = 4 > > hash->size is 5 > > gmond initialized cluster hash > > Using multicast-enabled interface eth1 > > mcast listening on 239.2.11.71 8649 > > Segmentation fault > > > > running strace really wasn't very enlightening either. I am using this > > version on another multihomed host w/o any problmes...... > > > > > > > > > > You may also want to go through the changelog, a few versions ago I seem > > > to recall some dnet trouble concerning multiple interfaces. My memory > > > could well be faulty in this instance as I've been focused on other > > > projects for the last few months... > > > > I didn't see anything. > > > > russell > > > > > > > > Russell Nordquist wrote: > > > > I have a strange issue with gmond dying immediatly. It's a multihomed > > > > host. It starts fine with the mcast_if is not set, but binds to the > > > > external NIC. when I add mcast_if eth1 it wont start. I added the > > > > appropriate route as descibed in the docs, but still nothing. > > > > > > > > Here's my setup: > > > > > > > > ifconfig: > > > > eth0 Link encap:Ethernet HWaddr 00:04:75:EB:75:15 > > > > inet addr:a.b.c.46 Bcast:a.b.c.255 > > > > Mask:255.255.255.0 > > > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > > > RX packets:16538 errors:0 dropped:0 overruns:1 frame:0 > > > > TX packets:3175 errors:0 dropped:0 overruns:0 carrier:0 > > > > collisions:0 txqueuelen:100 > > > > RX bytes:1615582 (1.5 MiB) TX bytes:689513 (673.3 KiB) > > > > Interrupt:5 Base address:0x1000 > > > > > > > > eth1 Link encap:Ethernet HWaddr 00:E0:81:25:AD:E0 > > > > inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0 > > > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > > > RX packets:3806 errors:0 dropped:0 overruns:1 frame:0 > > > > TX packets:2836 errors:0 dropped:0 overruns:0 carrier:0 > > > > collisions:0 txqueuelen:100 > > > > RX bytes:678293 (662.3 KiB) TX bytes:279688 (273.1 KiB) > > > > Interrupt:10 Base address:0x3000 > > > > > > > > lo Link encap:Local Loopback > > > > inet addr:127.0.0.1 Mask:255.0.0.0 > > > > UP LOOPBACK RUNNING MTU:16436 Metric:1 > > > > RX packets:19397 errors:0 dropped:0 overruns:0 frame:0 > > > > TX packets:19397 errors:0 dropped:0 overruns:0 carrier:0 > > > > collisions:0 txqueuelen:0 > > > > RX bytes:2540312 (2.4 MiB) TX bytes:2540312 (2.4 MiB) > > > > > > > > route: > > > > Kernel IP routing table > > > > Destination Gateway Genmask Flags Metric Ref Use > > > > Iface > > > > 239.2.11.71 0.0.0.0 255.255.255.255 UH 0 0 0 > > > > eth1 > > > > 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 > > > > eth1 > > > > a.b.c.0 0.0.0.0 255.255.255.0 U 0 0 0 > > > > eth0 > > > > 0.0.0.0 a.b.c.1 0.0.0.0 UG 0 0 0 > > > > eth0 > > > > > > > > I am using the ganglia-monitor 2.5.0-3 .deb (debian testing) > > > > > > > > It was working once, but stopped sometime during the building of this > > > > system it stopped. > > > > > > > > thanks > > > > > > > > russell > > > > > > > > - - - - - - - - - - - - > > > > Russell Nordquist > > > > UNIX Systems Administrator > > > > Geophysical Sciences Computing > > > > http://geosci.uchicago.edu/computing > > > > NSIT, University of Chicago > > > > - - - - - - - - - - - > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > This SF.Net email sponsored by: Parasoft > > > > Error proof Web apps, automate testing & more. > > > > Download & eval WebKing and get a free book. > > > > http://www.parasoft.com/bulletproofapps > > > > _______________________________________________ > > > > Ganglia-general mailing list > > > > Ganglia-general@... > > > > https://lists.sourceforge.net/lists/listinfo/ganglia-general > > > > > > > > > > - - - - - - - - - - - - > > Russell Nordquist > > UNIX Systems Administrator > > Geophysical Sciences Computing > > http://geosci.uchicago.edu/computing > > NSIT, University of Chicago > > - - - - - - - - - - - > > > > > > > > ------------------------------------------------------- > > This SF.Net email sponsored by: Parasoft > > Error proof Web apps, automate testing & more. > > Download & eval WebKing and get a free book. > > http://www.parasoft.com/bulletproofapps > > _______________________________________________ > > Ganglia-general mailing list > > Ganglia-general@... > > https://lists.sourceforge.net/lists/listinfo/ganglia-general > > > - - - - - - - - - - - - Russell Nordquist UNIX Systems Administrator Geophysical Sciences Computing http://geosci.uchicago.edu/computing NSIT, University of Chicago - - - - - - - - - - - |
From: steven wagner <swagner@il...> - 2003-07-09 00:55:58
|
Hector M. Jacas wrote: > Hello to all! > > I am looking for the way to build and to install a version of GMOND for > Tru64 v5.1A. Last year, when we had a meaningful number of Alphas on the premises running Tru64, I ported the monitoring core to that platform. It is an experience I don't recall fondly. The last time I remember it working with certainty was back around 2.4.3 - I didn't test newer versions thoroughly on Tru64 after that point, although others may have contributed to keep that platform up-to-date and building. But yes, at one point the monitoring core did run on Tru64. I'm not really in a position to help you now since I don't have an Alpha test box anymore. If I recall correctly, it was developed using GCC and the various GNU autotools. I only used the vendor's compiler when I was working on the IRIX port. That's all I can remember about it now, though. Good luck! |
From: matt massie <massie@cs...> - 2003-07-09 00:43:23
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 russell- another trick for debugging the problem is to examine the core dump. when you run... % ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 2047 virtual memory (kbytes, -v) unlimited you'll get your current user resource limitations. if you have the permissions (e.g. you are root or your admin lets users change their limits), you can change the core file size to unlimited. % ulimit -c unlimited this will allow programs to dump a core file when they segfault. that core file is very helpful. you can use a debugger (gdb) to find exactly what the program was doing when it crashed. for example, say you run the following... % ./gmond ... <SEGFAULT...CRASH...BOOM...BANG> % ls core* core.12345 % gdb --core=./core.12345 ./gmond GNU gdb Red Hat Linux (5.2.1-4) Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux". (gdb) you are now inside the debugger... and have a (gdb) prompt. (gdb) bt <enter> will show a backtrack of exactly what the program was doing when it croaked. (gdb) quit <enter> will exit the debugger. if you cut and paste a backtrace to the developers list, we will certainly know what is going on. sorry for the hassle. i'm sure will get this problem fixed. - -matt Today, Russell Nordquist wrote forth saying... > From: Russell Nordquist <rdn@...> > To: steven wagner <swagner@...> > Cc: ganglia-general@... > Date: Tue, 08 Jul 2003 17:21:08 -0500 (CDT) > Subject: Re: [Ganglia-general] gmond dying > > On Tue, 8 Jul 2003 at 14:50, steven wagner wrote: > > > I have no specific solutions for you but here are some potentially > > helpful tidbits which may permit you to shoot your own trouble: > > > > Does the monitoring core die right away? > > Does it dump core? > > Does it die when you run it in debug mode? > > Does debug mode tell you anything more about the error? > > Do other versions of the monitoring core exhibit this behavior? > > I turned the debugging up and: > > host:~# gmond > /etc/gmond.conf configuration > name is Octopod > owner is unspecified > latlong is unspecified > Cluster URL is unspecified > Host location is (x,y,z): unspecified > mcast_channel is 239.2.11.71 > mcast_port is 8649 > mcast_if is eth1 > mcast_ttl is 1 > mcast_threads is 2 > xml_port is 8649 > xml_threads is 2 > trusted hosts are: 128.135.28.150 > > num_nodes is 4 > num_custom_metrics is 16 > mute is 0 > deaf is 0 > debug_level is 10 > no_setuid is 0 > setuid is ganglia > no_gexec is 0 > all_trusted is 0 > pthread_attr_init > creating cluster hash for 4 nodes > hash_create size = 4 > hash->size is 5 > gmond initialized cluster hash > Using multicast-enabled interface eth1 > mcast listening on 239.2.11.71 8649 > Segmentation fault > > running strace really wasn't very enlightening either. I am using this > version on another multihomed host w/o any problmes...... > > > > > > You may also want to go through the changelog, a few versions ago I seem > > to recall some dnet trouble concerning multiple interfaces. My memory > > could well be faulty in this instance as I've been focused on other > > projects for the last few months... > > I didn't see anything. > > russell > > > > > Russell Nordquist wrote: > > > I have a strange issue with gmond dying immediatly. It's a multihomed > > > host. It starts fine with the mcast_if is not set, but binds to the > > > external NIC. when I add mcast_if eth1 it wont start. I added the > > > appropriate route as descibed in the docs, but still nothing. > > > > > > Here's my setup: > > > > > > ifconfig: > > > eth0 Link encap:Ethernet HWaddr 00:04:75:EB:75:15 > > > inet addr:a.b.c.46 Bcast:a.b.c.255 > > > Mask:255.255.255.0 > > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > > RX packets:16538 errors:0 dropped:0 overruns:1 frame:0 > > > TX packets:3175 errors:0 dropped:0 overruns:0 carrier:0 > > > collisions:0 txqueuelen:100 > > > RX bytes:1615582 (1.5 MiB) TX bytes:689513 (673.3 KiB) > > > Interrupt:5 Base address:0x1000 > > > > > > eth1 Link encap:Ethernet HWaddr 00:E0:81:25:AD:E0 > > > inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0 > > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > > RX packets:3806 errors:0 dropped:0 overruns:1 frame:0 > > > TX packets:2836 errors:0 dropped:0 overruns:0 carrier:0 > > > collisions:0 txqueuelen:100 > > > RX bytes:678293 (662.3 KiB) TX bytes:279688 (273.1 KiB) > > > Interrupt:10 Base address:0x3000 > > > > > > lo Link encap:Local Loopback > > > inet addr:127.0.0.1 Mask:255.0.0.0 > > > UP LOOPBACK RUNNING MTU:16436 Metric:1 > > > RX packets:19397 errors:0 dropped:0 overruns:0 frame:0 > > > TX packets:19397 errors:0 dropped:0 overruns:0 carrier:0 > > > collisions:0 txqueuelen:0 > > > RX bytes:2540312 (2.4 MiB) TX bytes:2540312 (2.4 MiB) > > > > > > route: > > > Kernel IP routing table > > > Destination Gateway Genmask Flags Metric Ref Use > > > Iface > > > 239.2.11.71 0.0.0.0 255.255.255.255 UH 0 0 0 > > > eth1 > > > 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 > > > eth1 > > > a.b.c.0 0.0.0.0 255.255.255.0 U 0 0 0 > > > eth0 > > > 0.0.0.0 a.b.c.1 0.0.0.0 UG 0 0 0 > > > eth0 > > > > > > I am using the ganglia-monitor 2.5.0-3 .deb (debian testing) > > > > > > It was working once, but stopped sometime during the building of this > > > system it stopped. > > > > > > thanks > > > > > > russell > > > > > > - - - - - - - - - - - - > > > Russell Nordquist > > > UNIX Systems Administrator > > > Geophysical Sciences Computing > > > http://geosci.uchicago.edu/computing > > > NSIT, University of Chicago > > > - - - - - - - - - - - > > > > > > > > > > > > > > > ------------------------------------------------------- > > > This SF.Net email sponsored by: Parasoft > > > Error proof Web apps, automate testing & more. > > > Download & eval WebKing and get a free book. > > > http://www.parasoft.com/bulletproofapps > > > _______________________________________________ > > > Ganglia-general mailing list > > > Ganglia-general@... > > > https://lists.sourceforge.net/lists/listinfo/ganglia-general > > > > > > - - - - - - - - - - - - > Russell Nordquist > UNIX Systems Administrator > Geophysical Sciences Computing > http://geosci.uchicago.edu/computing > NSIT, University of Chicago > - - - - - - - - - - - > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Parasoft > Error proof Web apps, automate testing & more. > Download & eval WebKing and get a free book. > http://www.parasoft.com/bulletproofapps > _______________________________________________ > Ganglia-general mailing list > Ganglia-general@... > https://lists.sourceforge.net/lists/listinfo/ganglia-general > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQE/C2UdVmIXr0CKtmERAjjXAJ968FcGm+uVcROqptKrugC6mdZCNQCdHvQY TryLKXp4SjHLQ16jTNzUkac= =74k7 -----END PGP SIGNATURE----- |
From: Russell Nordquist <rdn@uc...> - 2003-07-08 22:21:10
|
On Tue, 8 Jul 2003 at 14:50, steven wagner wrote: > I have no specific solutions for you but here are some potentially > helpful tidbits which may permit you to shoot your own trouble: > > Does the monitoring core die right away? > Does it dump core? > Does it die when you run it in debug mode? > Does debug mode tell you anything more about the error? > Do other versions of the monitoring core exhibit this behavior? I turned the debugging up and: host:~# gmond /etc/gmond.conf configuration name is Octopod owner is unspecified latlong is unspecified Cluster URL is unspecified Host location is (x,y,z): unspecified mcast_channel is 239.2.11.71 mcast_port is 8649 mcast_if is eth1 mcast_ttl is 1 mcast_threads is 2 xml_port is 8649 xml_threads is 2 trusted hosts are: 128.135.28.150 num_nodes is 4 num_custom_metrics is 16 mute is 0 deaf is 0 debug_level is 10 no_setuid is 0 setuid is ganglia no_gexec is 0 all_trusted is 0 pthread_attr_init creating cluster hash for 4 nodes hash_create size = 4 hash->size is 5 gmond initialized cluster hash Using multicast-enabled interface eth1 mcast listening on 239.2.11.71 8649 Segmentation fault running strace really wasn't very enlightening either. I am using this version on another multihomed host w/o any problmes...... > > You may also want to go through the changelog, a few versions ago I seem > to recall some dnet trouble concerning multiple interfaces. My memory > could well be faulty in this instance as I've been focused on other > projects for the last few months... I didn't see anything. russell > > Russell Nordquist wrote: > > I have a strange issue with gmond dying immediatly. It's a multihomed > > host. It starts fine with the mcast_if is not set, but binds to the > > external NIC. when I add mcast_if eth1 it wont start. I added the > > appropriate route as descibed in the docs, but still nothing. > > > > Here's my setup: > > > > ifconfig: > > eth0 Link encap:Ethernet HWaddr 00:04:75:EB:75:15 > > inet addr:a.b.c.46 Bcast:a.b.c.255 > > Mask:255.255.255.0 > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:16538 errors:0 dropped:0 overruns:1 frame:0 > > TX packets:3175 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:100 > > RX bytes:1615582 (1.5 MiB) TX bytes:689513 (673.3 KiB) > > Interrupt:5 Base address:0x1000 > > > > eth1 Link encap:Ethernet HWaddr 00:E0:81:25:AD:E0 > > inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0 > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:3806 errors:0 dropped:0 overruns:1 frame:0 > > TX packets:2836 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:100 > > RX bytes:678293 (662.3 KiB) TX bytes:279688 (273.1 KiB) > > Interrupt:10 Base address:0x3000 > > > > lo Link encap:Local Loopback > > inet addr:127.0.0.1 Mask:255.0.0.0 > > UP LOOPBACK RUNNING MTU:16436 Metric:1 > > RX packets:19397 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:19397 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:0 > > RX bytes:2540312 (2.4 MiB) TX bytes:2540312 (2.4 MiB) > > > > route: > > Kernel IP routing table > > Destination Gateway Genmask Flags Metric Ref Use > > Iface > > 239.2.11.71 0.0.0.0 255.255.255.255 UH 0 0 0 > > eth1 > > 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 > > eth1 > > a.b.c.0 0.0.0.0 255.255.255.0 U 0 0 0 > > eth0 > > 0.0.0.0 a.b.c.1 0.0.0.0 UG 0 0 0 > > eth0 > > > > I am using the ganglia-monitor 2.5.0-3 .deb (debian testing) > > > > It was working once, but stopped sometime during the building of this > > system it stopped. > > > > thanks > > > > russell > > > > - - - - - - - - - - - - > > Russell Nordquist > > UNIX Systems Administrator > > Geophysical Sciences Computing > > http://geosci.uchicago.edu/computing > > NSIT, University of Chicago > > - - - - - - - - - - - > > > > > > > > > > ------------------------------------------------------- > > This SF.Net email sponsored by: Parasoft > > Error proof Web apps, automate testing & more. > > Download & eval WebKing and get a free book. > > http://www.parasoft.com/bulletproofapps > > _______________________________________________ > > Ganglia-general mailing list > > Ganglia-general@... > > https://lists.sourceforge.net/lists/listinfo/ganglia-general > > - - - - - - - - - - - - Russell Nordquist UNIX Systems Administrator Geophysical Sciences Computing http://geosci.uchicago.edu/computing NSIT, University of Chicago - - - - - - - - - - - |
From: steven wagner <swagner@il...> - 2003-07-08 21:50:17
|
I have no specific solutions for you but here are some potentially helpful tidbits which may permit you to shoot your own trouble: Does the monitoring core die right away? Does it dump core? Does it die when you run it in debug mode? Does debug mode tell you anything more about the error? Do other versions of the monitoring core exhibit this behavior? You may also want to go through the changelog, a few versions ago I seem to recall some dnet trouble concerning multiple interfaces. My memory could well be faulty in this instance as I've been focused on other projects for the last few months... Russell Nordquist wrote: > I have a strange issue with gmond dying immediatly. It's a multihomed > host. It starts fine with the mcast_if is not set, but binds to the > external NIC. when I add mcast_if eth1 it wont start. I added the > appropriate route as descibed in the docs, but still nothing. > > Here's my setup: > > ifconfig: > eth0 Link encap:Ethernet HWaddr 00:04:75:EB:75:15 > inet addr:a.b.c.46 Bcast:a.b.c.255 > Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:16538 errors:0 dropped:0 overruns:1 frame:0 > TX packets:3175 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:100 > RX bytes:1615582 (1.5 MiB) TX bytes:689513 (673.3 KiB) > Interrupt:5 Base address:0x1000 > > eth1 Link encap:Ethernet HWaddr 00:E0:81:25:AD:E0 > inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:3806 errors:0 dropped:0 overruns:1 frame:0 > TX packets:2836 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:100 > RX bytes:678293 (662.3 KiB) TX bytes:279688 (273.1 KiB) > Interrupt:10 Base address:0x3000 > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:19397 errors:0 dropped:0 overruns:0 frame:0 > TX packets:19397 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:2540312 (2.4 MiB) TX bytes:2540312 (2.4 MiB) > > route: > Kernel IP routing table > Destination Gateway Genmask Flags Metric Ref Use > Iface > 239.2.11.71 0.0.0.0 255.255.255.255 UH 0 0 0 > eth1 > 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 > eth1 > a.b.c.0 0.0.0.0 255.255.255.0 U 0 0 0 > eth0 > 0.0.0.0 a.b.c.1 0.0.0.0 UG 0 0 0 > eth0 > > I am using the ganglia-monitor 2.5.0-3 .deb (debian testing) > > It was working once, but stopped sometime during the building of this > system it stopped. > > thanks > > russell > > - - - - - - - - - - - - > Russell Nordquist > UNIX Systems Administrator > Geophysical Sciences Computing > http://geosci.uchicago.edu/computing > NSIT, University of Chicago > - - - - - - - - - - - > > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Parasoft > Error proof Web apps, automate testing & more. > Download & eval WebKing and get a free book. > http://www.parasoft.com/bulletproofapps > _______________________________________________ > Ganglia-general mailing list > Ganglia-general@... > https://lists.sourceforge.net/lists/listinfo/ganglia-general |
From: Russell Nordquist <rdn@uc...> - 2003-07-08 21:09:34
|
I have a strange issue with gmond dying immediatly. It's a multihomed host. It starts fine with the mcast_if is not set, but binds to the external NIC. when I add mcast_if eth1 it wont start. I added the appropriate route as descibed in the docs, but still nothing. Here's my setup: ifconfig: eth0 Link encap:Ethernet HWaddr 00:04:75:EB:75:15 inet addr:a.b.c.46 Bcast:a.b.c.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:16538 errors:0 dropped:0 overruns:1 frame:0 TX packets:3175 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:1615582 (1.5 MiB) TX bytes:689513 (673.3 KiB) Interrupt:5 Base address:0x1000 eth1 Link encap:Ethernet HWaddr 00:E0:81:25:AD:E0 inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3806 errors:0 dropped:0 overruns:1 frame:0 TX packets:2836 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:678293 (662.3 KiB) TX bytes:279688 (273.1 KiB) Interrupt:10 Base address:0x3000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:19397 errors:0 dropped:0 overruns:0 frame:0 TX packets:19397 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2540312 (2.4 MiB) TX bytes:2540312 (2.4 MiB) route: Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 239.2.11.71 0.0.0.0 255.255.255.255 UH 0 0 0 eth1 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 a.b.c.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 0.0.0.0 a.b.c.1 0.0.0.0 UG 0 0 0 eth0 I am using the ganglia-monitor 2.5.0-3 .deb (debian testing) It was working once, but stopped sometime during the building of this system it stopped. thanks russell - - - - - - - - - - - - Russell Nordquist UNIX Systems Administrator Geophysical Sciences Computing http://geosci.uchicago.edu/computing NSIT, University of Chicago - - - - - - - - - - - |
From: Robert Walsh <rjwalsh@du...> - 2003-07-08 20:37:58
|
> ganglia seems to use the gcc predefined strings, so you need to add an > entry for __x86_64__ I sent out a patch for Opteron support to ganglia-developers some time back, but it doesn't appear to have made it into the CVS head. Here it is once again. Let me know if you see any weirdness with it. It also fixes up an RPM build problem I'd noticed. Regards, Robert. |