From: Mahendra K. <mah...@gm...> - 2009-07-30 20:30:40
|
On Thu, Jul 30, 2009 at 12:25 PM, Brad Nicholes <BNI...@no...>wrote: > >>> On 7/30/2009 at 9:08 AM, in message > <669...@ma...>, Mahendra > Kutare > <mah...@gm...> wrote: > > On Thu, Jul 30, 2009 at 10:31 AM, Brad Nicholes <BNI...@no... > >wrote: > > > >> >>> On 7/29/2009 at 11:23 PM, in message > >> <669...@ma...>, Mahendra > >> Kutare > >> <mah...@gm...> wrote: > >> > Hi All, > >> > > >> > If I have configured gmond.conf with a udp_recv_channel with just a > port > >> > number will that configure ganglia gmond to listen on that particular > >> port > >> > any incoming data and thus making it essentially unicast communication > >> > channel ? > >> > > >> > >> Yes, specifying just a port will configure gmond's recv channel in > unicast > >> mode > >> > >> > What happens if the sending side sends data every 1 sec will that be > >> > transferred immediately to gmond or it waits to collects some packets > of > >> > data and then delivers to gmond listening side ? > >> > > >> > I started sending some data from outside of gmond interface to gmond > >> which > >> > is configured as mentioned above to a udp_recv_channel on port 8108. > >> > > >> > Now even though the sending side is pushing data in every 1sec. I do > not > >> see > >> > gmond showing in debug mode on the console that its processing Ganglia > >> > message from sender side every 1 sec. > >> > > >> > Is it just the display part of the problem or ganglia does some > >> > sophisticated processing of incoming data i.e waiting for a message > size > >> > before delivering it ? > >> > > >> > >> How did you configure gmond to send data every 1 sec.? Gmond sends its > >> data in collection groups and each collection group is configured with a > >> send time threshold. At the very worst, the collection group will send > all > >> of the metric values within that group once the group's collection > threshold > >> has been exceeded. In addition, each metric is assigned a value > threshold > >> which is a percent of change differential. If any of the metrics within > the > >> collection group, differential change exceeds the value threshold, the > >> entire group of metrics is immediately sent. So even though a > collection > >> group is set to collect every 1 second, that doesn't mean that the > metrics > >> are sent every 1 second. Also, by default the rrd files are configured > by > >> gmetad to store metrics at an interval of every 15 seconds. So even if > the > >> metrics were sent every 1 second, you will still only be seeing 15 > second > >> averages in the front end. > >> > > > > Thanks Brad. I am trying to do it to understand the ganglia protocol and > > this helps. > > Right now its fine with me even if Gmetad sees only 15 seconds average in > > frontend as you described. > > > > So as I see there are other configuration in collection groups such as - > > > > 1. collect_once and collect_every > > > > I understand that collect_once with make some collection to be collected > > only once and just send it other gmond every time_threshold. > > Also, If I am not wrong If I configured collect_every = 20 and > > time_threshold=90, gmond will collect every 20 sec and send every 90 sec > to > > other gmond. > > > > Under normal circumstances it will send every 90 seconds but if one of the > metric value_thresholds has been exceeded, the entire collection group will > be sent immediately. The purpose for this is to make sure that > abnormalities or spikes are caught and reported. > > > Now the part I am not clear is if I am collecting more frequently than I > am > > sending does that mean we are keeping more in memory ? I mean say after > > first occurance of collect in 20 sec if I am not sending it across to > gmonds > > am I just keeping it in memory hash ? If not, whats the behaviour ? > > > > No, if you are collecting every 20 seconds but the collection group is only > sending every 90 seconds, the only metric that is sent or reported is the > last metric collected with the 90 second interval. This is the purpose of > the metric value_threshold. If for example, you collected a metric 4 times > within a 90 second period and the delta between each collected metric value > only varied by 5 percent, storing and reporting each of the metrics would > just end up being noise on the wire because the percent of change between > the values is insignificant. So just sending the last metric collected in > this case is good enough. However if the metric saw a spike within the 90 > second period but then immediately dropped back to normal, you want to make > sure that the metric spike is sent and recorded so gmond sends it > immediately. > > > 2. What does this configuation *cleanup_threshold* = 300 /*secs * ? > > > > Is it cleaning stuff in memory hash ? If yes, is it happening > concurrently > > while gmond is trying to send data to other gmonds ? What happens if the > > cleanup threshold is reached and gmond collection metric also reached > > time_threshold or say if its synchronized first cleanup_threshold and > next > > time_threshold ? > > > > Will it just send all NULL ? > > No, the cleanup_threshold is an interval where gmond will analyze all of > the hosts that have been reported and determines if they are still > reporting. If for example, host1 was reporting metrics and then was removed > from the grid, gmond will no longer receive data from host1 and that host > should no longer be included in the XML data that is collected by gmetad. > Brad thanks for your responses. It really helped to understand the protocol. I have few more questions about memory hash - I understand Gmond has threads which listen to the multicast channel and write the data collected to a fast, in-memory hash table. What I am trying to understand is - a) Now is this memory hash table a bounded memory buffer ? b) If a gmond recieve a packet on multicast channel, it just goes ahead and update the memory-hash with some hash identifiers say node-location or collection group based What happens when we get another same collection metrics on multicast channel say after 20 sec time_threshold or 10 sec value_threshold which causes collection group to multicast ? Does that update all the values for the collection group in the memory hash obviously with the multiple threads updating multiple collection group metrics stored in memory hash ? Thanks Mahednra |