Re: [Ganglia-general] Huge metrics' size being reported to gmetad

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

 Hi Cristovao,
unless I missed some feature implemented between 3.0 and 3.6, there is no way in gmond/collector nor in gmetad, to select the metrics. Everything depends on the sender.
Someone from the developers may actually prove me wrong of course :-)

Cheers,
  Sergio

On 1 Jul 2014, at 10:30, Cristovao Jose Domingues Cordeiro <cri...@ce...> wrote:

> Hi Sergio,
> 
> thanks for the reply.
> But I thought the collector gmond would be the one defining which metrics are reported...isn't there a way of just collecting the metrics which are configured in the collecting gmond? 
> 
> We have no control over what happens in the clusters, and consequently, if it works like you're saying, chances are that we'll have lots of problems like these, lots of times...without quick fixing actions.
> 
> Cumprimentos / Best regards,
> Cristóvão José Domingues Cordeiro
> 
> From: Sergio Ballestrero [ser...@gm...]
> Sent: 30 June 2014 18:50
> To: Cristovao Jose Domingues Cordeiro
> Cc: Ganglia List
> Subject: Re: [Ganglia-general] Huge metrics' size being reported to gmetad
> 
>  Hi Cristovao,
> 250*12kB = ~3MB, it fits. The 12k size is the smaller "old default" for rrd, so that's not the issue. It's just that 255 metrics are a lot, and that depends on what has been configured on the gmond agent running on the client, not much you can do from the side of the collector. Maybe see if they can cut them down, starting with on those imcp and tcpext metrics which are possibly not much needed ?
> 
> Cheers,
>   Sergio
> 
> On 30 Jun 2014, at 18:30, Cristovao Jose Domingues Cordeiro <cri...@ce...> wrote:
> 
>> oh ok :p
>> 
>> well it has many more files than usual (~250 ... sorry for the long list):
>> ....
>> but my configuration for this cluster collector is exactly the same as for the other ones.
>> 
>> 
>> Cumprimentos / Best regards,
>> Cristóvão José Domingues Cordeiro
>> IT Department - 28/1-010
>> CERN
>> From: Sergio Ballestrero [ser...@gm...]
>> Sent: 30 June 2014 18:24
>> To: Cristovao Jose Domingues Cordeiro
>> Subject: Re: [Ganglia-general] Huge metrics' size being reported to gmetad
>> 
>> Hi Cristovao,
>> I meant inside the X dir ;-) to see the nr of files and their size
>> Ciao, S
>> PS I find the old rrd default too lowres and the new one too large, still searching for my right compromise
>> On 30 Jun 2014 18:18, "Cristovao Jose Domingues Cordeiro" <cri...@ce...> wrote:
>> Hi all,
>> 
>> in fact Chris is right, and I've noticed that as well a few months ago.
>> Now I also have:
>> # Old Default RRA: Keep 1 hour of metrics at 15 second resolution. 1 day at 6 minute
>> RRAs "RRA:AVERAGE:0.5:1:244" "RRA:AVERAGE:0.5:24:244" "RRA:AVERAGE:0.5:168:244" "RRA:AVERAGE:0.5:672:244" \
>>       "RRA:AVERAGE:0.5:5760:374"
>> # New Default RRA
>> # Keep 5856 data points at 15 second resolution assuming 15 second (default) polling. That's 1 day
>> # Two weeks of data points at 1 minute resolution (average)
>> #RRAs "RRA:AVERAGE:0.5:1:5856" "RRA:AVERAGE:0.5:4:20160" "RRA:AVERAGE:0.5:40:52704"
>> and the problem continues.
>> 
>> Sergio, for host X:
>> $ du -h X
>> 2.9M    X
>> 
>> $ ls -lha 
>> drwxr-xr-x.  2 nobody nobody  12K Jun 17 08:05 X
>> Does it make sense to you?
>> 
>> Cumprimentos / Best regards,
>> Cristóvão José Domingues Cordeiro
>> 
>> From: Christophe HAEN [chr...@ce...]
>> Sent: 30 June 2014 17:51
>> To: Grigory Shamov
>> Cc: Ganglia
>> Subject: Re: [Ganglia-general] Huge metrics' size being reported to gmetad
>> 
>> Hi Grigory,
>> 
>> I was hit by a similar problem before realizing that the default changed from one version to another. This what I have now
>> 
>> # Old Default RRA: Keep 1 hour of metrics at 15 second resolution. 1 day at 6 minute
>>  RRAs "RRA:AVERAGE:0.5:1:244" "RRA:AVERAGE:0.5:24:244" "RRA:AVERAGE:0.5:168:244" "RRA:AVERAGE:0.5:672:244" \
>>       "RRA:AVERAGE:0.5:5760:374"
>> # New Default RRA
>> # Keep 5856 data points at 15 second resolution assuming 15 second (default) polling. That's 1 day
>> # Two weeks of data points at 1 minute resolution (average)
>> #RRAs "RRA:AVERAGE:0.5:1:5856" "RRA:AVERAGE:0.5:4:20160" "RRA:AVERAGE:0.5:40:52704"
>> 
>> 
>> Cheers,
>> Chris
>> 
>> 
>> 2014-06-30 17:30 GMT+02:00 Grigory Shamov <Gri...@um...>:
>> Dear Sergio,
>> 
>> Somehow on my 300-node cluster Ganglia, with more or less default metrics and RRD configs, collects over 40GB! If there are smarter RRD settings to reduce the size, it would be very interesting to learn them.
>> 
>> -- 
>> Grigory Shamov
>> HPC Analyst, Tech. Site Lead,
>> Westgrid/Compute Canada
>> E2-588 EITC Building, 
>> University of Manitoba
>> (204) 474-9625
>> 
>> 
>> 
>> From: Sergio Ballestrero <ser...@gm...>
>> Date: Monday, 30 June, 2014 10:21 AM
>> To: Cristovao Jose Domingues Cordeiro <cri...@ce...>
>> Cc: Ganglia <gan...@li...>
>> Subject: Re: [Ganglia-general] Huge metrics' size being reported to gmetad
>> 
>> Hi Cristovao,
>> that depends on how many metrics and on the rrd creation settings. Sure 150MB looks like a lot. An ls -la may give more hints...
>> Ciao, 
>> Sergio
>> On 30 Jun 2014 16:35, "Cristovao Jose Domingues Cordeiro" <cri...@ce...> wrote:
>> Someone?
>> 
>> Cumprimentos / Best regards,
>> Cristóvão José Domingues Cordeiro
>> 
>> From: Cristovao Jose Domingues Cordeiro [cri...@ce...]
>> Sent: 24 June 2014 13:58
>> To: gan...@li...
>> Subject: [Ganglia-general] Huge metrics' size being reported to gmetad
>> 
>> Hi,
>> 
>> I have a grid configuration, with several clusters. I am also using RAM disk for I/O optimization (4GB).
>> 
>> I've been noticing that sometimes, gmetad breaks, complaining about lack of space in this tmpfs partition.
>> 
>> I checked and I saw that for some reason, some clusters, have hosts which occupy 3MB, 6MB and even sometimes 150MB!!!! All together  makes the cluster occupy 2GB and consequently occupy half of the ramdisk space.
>> 
>> We would normally expect these host metrics to have +/- 336k right? 
>> 
>> Has anyone experienced this?
>> 
>> Cumprimentos / Best regards,
>> Cristóvão José Domingues Cordeiro
>> IT Department - 28/1-010
>> CERN
>> 

Re: [Ganglia-general] Huge metrics' size being reported to gmetad

Scalable, distributed monitoring system for high-performance computing

Re: [Ganglia-general] Huge metrics' size being reported to gmetad