Thanks for your help on this. That's disappointing. I guess I'll
either move to unicast or set up multicast routing. I like the ease and
redundancy of multicast, but I don't really like having all my nodes
broadcast continually across the entire cluster.
Have a great weekend!
Rick Cobb wrote:
> One more time trying to get this thread back on the list instead of
> just between David & me.
> And I'll disclaim expertise on 3.1 here. On the other hand, I've been
> in this code more times than I wanted to in 3.0, and I don't think the
> fundamental design of gmetad was affected by 3.0 -> 3.1.
> I'm not claiming the behavior is consistent, of course -- but really,
> the fact that this looks like it kind-of-works is the core bug.
> Gmetad thinks 1 datasource == 1 cluster. In particular, it runs one
> thread per datasource, and that thread maintains the cluster summary
> metrics. You can construct things so the front-end & cluster
> directories pretend that 3 datasources == 1 cluster, but it doesn't
> work, and that's what you're running into. I've done it
> intentionally, actually, and simply ignored the fact that summaries
> were wrong for that cluster, but I don't recommend it. In particular,
> you're probably getting a ton of 'rrd_update' errors for summary_info
> RRDfiles in your syslog.
> To get consistent behavior, your options are:
> * Treat this as one *grid* of 3 clusters. Grid summaries will work,
> but the meta view isn't nearly as functional as the cluster view, so
> it's not really optimal for what you want.
> * Get these to come in as one datasource. Since you have separate
> multicast domains, you may have to resort to unicast to do this.
> Sorry to be the bearer of bad news, but it would take a fairly nasty
> bit of gmetad hacking to fix this -- and it would deeply affect
> scalability, since the single-thread-per-datasource solution removes a
> lot of opportunities for lock contention.
> -- ReC
> - Show quoted text -
> On Fri, Oct 29, 2010 at 9:52 AM, David B Ritch <david.ritch@...
> <mailto:david.ritch@...>> wrote:
> Thanks, Rick. Unfortunately, that doesn't seem to be the problem
> I'm running into. I do have the cluster name set to Datanodes in
> all the client. Otherwise, I wouldn't expect it to show all of
> them when I click Show Hosts.
> Rick Cobb wrote:
> This is such a common misconception that the development team
> should consider removing the name field from the data_source
> configuration line entirely.
> Fundamentally, cluster names come from the gmond.conf files.
> The names of datasources exist only to confuse the hell out
> of you and create bugs. You need to change those gmond.conf's
> to match the cluster names you want.
> IIRC, it's a good idea for the datasources lines to match
> those because they actually are used in a few places and
> having them *not* match just confuses the next guy who
> maintains your system.
> -- ReC
> On Fri, Oct 29, 2010 at 6:15 AM, David B. Ritch
> <david.ritch@... <mailto:david.ritch@...>
> <mailto:david.ritch@... <mailto:david.ritch@...>>>
> I'm running Ganglia-3.1.7 under RHEL-5.5 on a cluster. My
> nodes are
> divided into different classes for monitoring. My largest
> class of
> nodes, datanodes, spans 3 VLANs, and I don't route
> multicast between
> those domains. I have the following in gmetad.conf on my
> master node:
> data_source "Datanodes" r01n40-ge:8649 r03n40-ge:8649
> data_source "Datanodes2" r11n40-ge:8649 r13n40-ge:8649
> data_source "Datanodes3" r21n40-ge:8649 r23n40-ge:8649
> Each datanode has "Datanodes" specified as its cluster name.
> When I look at the web interface, at the grid level, the
> summary of my
> Datanodes only shows 1/3 of my datanodes. When I select
> the Datanodes
> cluster (Grid > Datanodes), and select Show Hosts: no, I
> see the same
> graph and the same number of nodes. However, when I select
> yes, The Hosts up: and CPUs Total both jump up to the
> proper totals.
> Apparently, gmetad sees all the nodes and puts them in the
> cluster, but doesn't calculate the summaries properly.
> Am I doing something wrong, or is the a problem in Ganglia?
> Nokia and AT&T present the 2010 Calling All Innovators-North
> America contest
> Create new apps & games for the Nokia N8 for consumers in U.S.
> and Canada
> $10 million total in prizes - $4M cash, 500 devices, nearly
> $6M in
> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish
> to Ovi
> Ganglia-general mailing list