Re: [Ganglia-developers] sFlow counters in Ganglia

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Brad,

Thanks for the information on spoofing in modules.

> The way that I would envision an sflow module working would be similar to the spoofing example module that is currently checked into the Ganglia SVN repository.  The spoofing module can be found at http://ganglia.svn.sourceforge.net/viewvc/ganglia/trunk/monitor-core/gmond/python_modules/example/spfexample.py?revision=1895&view=markup  .  Unfortunately it is a python module example rather than a C module but hopefully you can get the idea of what I am talking about from the code.  One application of this kind of spoofing module would be to load it under a gmond instance running on a VM host.  It would then query each VM running on the box and register a set of spoofed metrics for each VM.  From that point on, the module just reports the metrics for each spoofed VM and returns them as if gmond were running on each of the VMs.  I actually have another python module that does exactly that, but I haven't been able to release the source code for it yet.  You can also look at the modpython.c module to get an idea of how to do the spoofing in C code.  But then you guys have already worked with the spoofing code as part of the patch that you already did so you probably already know how that works.
> 
> Basically an sflow module would be loaded like any other module and a collection interval would be set in the configuration file.  In the sflow module itself, register a spoofed metric for each managed sflow monitored node.  How you get the list of nodes to register is up to you.  It could be from the gmond.conf file, some other configuration file or by listening to the sflow data packets themselves.  
> 
> The module would then start a thread that would read the XDR packets in exactly the same way that you are doing now in the gmond code. The thread would then store the current metric data temporarily for each monitored node that it knows about until the next collection cycle happens.  Then when gmond requests specific metric data for a specific spoofed node, you just return the last metric read from your temporary storage.
> 
> Hopefully this helps.  The whole intent of the modular interface is to move all of the metric gathering out of gmond itself in order to make gmond more flexible and upgradable for the user without having to depend on all of us as Ganglia developers to get around to releasing a new version every time a bug is found or someone wants to collect a metric in a different way.  By implementing sflow as a module rather than within gmond itself, it would help to make sure gmond remains flexible and upgradable for the user.
> 

I think that there is still a problem with this approach, resampling the counters from an sFlow module would create ugly artifacts. The sFlow sampling rates are set in the sFlow agents and there is no guarantee that they would mesh well with a polling interval set in the gmond.conf file. There is also no guarantee that the sFlow agents will all be using the same polling intervals. To get accurate results you need to asynchronously post sFlow data into the gmond repository as it arrives.

Another advantage of the current approach is that it involves zero configuration. sFlow agents are automatically discovered and added to the database as soon as they start sending sFlow to gmond. The sFlow gateway function automatically detects and adapts to changes in sFlow agent polling intervals etc.

Peter

Re: [Ganglia-developers] sFlow counters in Ganglia

Scalable, distributed monitoring system for high-performance computing

Re: [Ganglia-developers] sFlow counters in Ganglia