From: Eli S. <es...@il...> - 2006-04-14 21:48:18
|
FWIW, I find gmetad utterly unusable in my configuration without running out of tmpfs. I don't believe it is specifically rrdtool, since I have a cacti instance polling several hundred devices for hundreds of thousands of OID's every five minutes (near-continual RRD write/update period)... and that has very little impact on IOwait or disk contention, such that I still run that directly off a local array. gmetad on the other hand will completely crush any system I try to run atainst the _fastest_ local disk I throw it at. This with a client base of (average) 1000 nodes/5-8 clusters. When run out of tmpfs, I maintain a load of .01 and a network BW utilization of ~250KB/sec... when run vs local fast disk and an external write journal, this will flatten a quad-proc Opteron to the point of SSH response being _really_ slow, without showing more than a few MB/sec R/W vs. the disk... just seems like very inefficient IO patterns. I haven't even put as much effort into looking at the problem as Gilad, as the tmpfs fix works fine... but IMO I think it probably has something to do with the way rrdtool is being called, not just its use vs. that many rrd's. /eli Jason A. Smith wrote: > Hi Gilad, > > I thought I remember a sort of mini HOWTO or FAQ that existed on the old > ganglia web page which gave suggestions on how to setup ganglia, but I > can't find it now. > > Anyway, I think ganglia's heavy IO requirements (mostly from rrdtool) > are fairly well known to long time users, and each has probably come up > with their own way around it. Here, we are using a diskless database > directory for ganglia's rrd area, by using Linux's tmpfs: > > /etc/fstab contains this line: > none /var/lib/ganglia/rrds tmpfs size=1024M,mode=755,uid=nobody,gid=nobody 0 0 > > The uid & gid options should match your gmetad.conf's setuid setting. > > Then we backup the database directory using tar every night just to > prevent complete data loss in case our ganglia server crashes. > > ~Jason > > > On Fri, 2006-04-14 at 13:06 -0700, Gilad Raphaelli wrote: >> I'm actually seeing 100% disk busy under both rhel4 and freebsd 4.11 >> with just 98 nodes in 13 clusters. My goal is to get gmetad running >> on freebsd, rhel4 was just for comparision's sake. A ktrace reveals >> 100s of failed mkdirs during every writing period - traceable to >> rrd_helpers.c. There don't seem to be any other significant events. >> When the disk hits 100% iowait the system is unusable. >> >> I was under the impression that a relatively low powered system could >> handle something like this configuration - perhaps that is the issue? >> The box is a PIII 800 with 1.5 GB mem - the rrds are stored on a >> dedicated 70 GB ide disk. >> >> Any insight would be appreciated. I'm hanging out in #ganglia on >> freenode if anyone wants to chat. >> >> Thank you, >> >> Gil >> >> ----- Original Message ---- >> From: Bernard Li <bl...@bc...> >> To: kn...@kn...; kn...@kn...; ganglia- >> dev...@li... >> Sent: Thursday, April 13, 2006 11:19:50 PM >> Subject: [Ganglia-developers] RE: [Ganglia-general] New (final?) >> tarball for ganglia-3.0.3 >> >> Hi Martin: >> >> Finally had the time to test it, here's the text in the webpage now: >> >> Gmetad Web Frontend version 3.0.3.200604132304 Check for Updates. >> Gmetad Web Backend (gmetad) version 3.0.3.200604102000 Check for >> Updates. >> >> Looks like it's fixed. >> >> BTW, I tested Ganglia on Fedora Core 5 x86 and it is working fine. >> >> Did anybody else test 3.0.3? Somebody on IRC mentioned that he was >> having issues with gmetad using up 99% CPU with a large number of >> clients (50+). >> >> Cheers, >> >> Bernard >> >> >> ______________________________________________________________________ >> From: Martin Knoblauch [mailto:kn...@kn...] >> Sent: Tue 11/04/2006 11:38 >> To: Bernard Li; kn...@kn...; ganglia- >> dev...@li... >> Subject: RE: [Ganglia-general] New (final?) tarball for ganglia-3.0.3 >> >> >> >> Bernard, >> >> could you please test the following patch in "web" to solve this >> really really big problem :-) You need to run "./configure" to >> recreate >> "web/version.php". >> >> $diff -u -r1.9 ganglia.php >> --- ganglia.php 25 Mar 2006 01:53:57 -0000 1.9 >> +++ ganglia.php 11 Apr 2006 18:34:31 -0000 >> @@ -33,7 +33,8 @@ >> $version = array(); >> >> # The web frontend version, from conf.php. >> -$version["webfrontend"] = >> "$majorversion.$minorversion.$microversion"; >> +#$version["webfrontend"] = >> "$majorversion.$minorversion.$microversion"; >> +$version["webfrontend"] = "$ganglia_version"; >> >> # The name of our local grid. >> $self = " "; >> >> >> $diff -u -r1.1 version.php.in >> --- version.php.in 10 Dec 2004 21:34:04 -0000 1.1 >> +++ version.php.in 11 Apr 2006 18:34:50 -0000 >> @@ -5,7 +5,7 @@ >> $minorversion = @GANGLIA_MINOR_VERSION@; >> $microversion = @GANGLIA_MICRO_VERSION@; >> >> -$ganglia_version = >> "@GANGLIA_MAJOR_VERSION@.@GANGLIA_MINOR_VERSION@.@GANGLIA_MICRO_VERSION@"; >> +$ganglia_version = "@GANGLIA_VERSION@"; >> $ganglia_release_name = "@GANGLIA_RELEASE_NAME@"; >> >> ?> >> >> >> --- Bernard Li <bl...@bc...> wrote: >> >>> Just tested building and running on Fedora Core 4 x86, everything >>> checks out (minimal installation test) - did notice this minor issue >>> though: >>> >>> Gmetad Web Frontend version 3.0.3 Check for Updates. >>> Gmetad Web Backend (gmetad) version 3.0.3.200604102000 Check for >>> Updates. >>> >>> Notice the versions are different between webfrontend and gmetad - I >>> guess they use difference sources for the version string? >>> >>> Chris, are you still planning to help us test with your hardware? >>> >>> Thanks, >>> >>> Bernard >>> >>> P.S. If anybody wants the RPMs, please ping me. >>> >>> ________________________________ >>> >>> From: gan...@li... on behalf of >> Martin >>> Knoblauch >>> Sent: Sat 08/04/2006 00:31 >>> To: ganglia general; gan...@li... >>> Subject: [Ganglia-general] New (final?) tarball for ganglia-3.0.3 >>> >>> >>> >>> Hi, >>> >>> as promised, I have created a new pre-3.0.3 tarball. It can be >>> downloaded from: >>> >>> http://www.knobisoft.de/ganglia/ganglia-3.0.3.200604080900.tar.gz >>> >>> Due to the release plans for OSCAR5, this could be the last snaphot >>> before a release next week. >>> >>> Especially the following problems are supposed to be solved: >>> >>> - truncated XML >>> - bogus "old protocol" messages in dead-host detection >>> - gmetad will not stop updating RRDs after a previous failure >>> - apr-0.9.7 is now officially in CVS >>> - minor fixes to the webfrontend >>> - more minor stuff -> See the ChangeLog >>> >>> Cheers >>> Martin >>> >>> ------------------------------------------------------ >>> Martin Knoblauch >>> email: k n o b i AT knobisoft DOT de >>> www: http://www.knobisoft.de >>> >>> >>> ------------------------------------------------------- >>> This SF.Net email is sponsored by xPML, a groundbreaking scripting >>> language >>> that extends applications into web and mobile media. Attend the live >>> webcast >>> and join the prime developer group breaking into this new coding >>> territory! >>> >> http://sel.as-us.falkag.net/sel? >> cmd=lnk&kid=110944&bid=241720&dat=121642 >>> _______________________________________________ >>> Ganglia-general mailing list >>> Gan...@li... >>> https://lists.sourceforge.net/lists/listinfo/ganglia-general >>> >>> >>> >> >> ------------------------------------------------------ >> Martin Knoblauch >> email: k n o b i AT knobisoft DOT de >> www: http://www.knobisoft.de >> >> >> |