Re: [Ganglia-developers] gmetad and rrdtool scalability

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Thu, Dec 24, 2009 at 12:10:51PM +0000, Daniel Pocock wrote:
> Vladimir Vuksan wrote:
>>
>> The issue is value of this data. If these were financial transactions  
>> than no loss would be acceptable however these are not. They are  
>> performance, trending data which get "averaged" down as time goes by  
>> so loss of couple hours or even days of data is not tragic.
>
> I agree - it doesn't have to be perfect.

still the current implementation has ways to go and should be most likely
expanded for more data reliability as far as it doesn't cost to much.

> To come back to my own requirement though, it is about horizontal  
> scalability.  Let's say you have a hypothetical big enterprise that has  
> just decided to adopt Ganglia as a universal solution on every node in  
> every data center globally, including subsidiary companies, etc.
>
> No one really wants to manually map individual servers to clusters and  
> gmetad servers.  They want plug-and-play.

the currently federated model of gmetad helps slightly in that respect
as you would expect each one of the independent offices/units/datacenters
would have 1 gmetad locally (as far as it is big enough to handle the load)
to collect and aggregate data and 1 central gmetad that connects to all
the leaves for the centralized view.

of course you can also have more than 1 gmetad (even 1 per cluster per
location) and make the gmetad hierarchy tree a little larger.

> They just want to allocate some storage and gmetad hardware in each main  
> data center, plug them in, and watch the graphs appear.  If the CPU or  
> IO load gets too high on some of the gmetad servers in a particular  
> location, they want to re-distribute the load over the others in that  
> location.  When the IO load gets too high on all of the gmetads, they  
> want to be able to scale horizontally - add an extra 1 or 2 gmetad  
> servers and see the load distributed between them.

horizontal scalability like these would be ideal, but again, the added
complexity cost might be difficult to assimilate.

> Maybe this sounds a little bit like a Christmas wish-list, but does  
> anyone else feel that this is a valid requirement?  Imagine something  
> even bigger - if a state or national government decided to deploy the  
> gmond agent throughout all their departments in an effort to gather  
> utilization data - would it scale?  Would it be easy enough for a  
> diverse range of IT departments to just plug it in?

with enough planning and assuming the cluster tree is somehow balanced
it should work fine IMHO, but for very large clusters or ones that span
multiple locations and can't be split logically (clouds) you would soon
run into scalability issues, including as well memory pressure in the
gmond collectors.

> Carlo also made some comments about RDBMS instead of RRD.  This raises a  
> few discussion points:

I meant RDBMs alongside RRDs, as RRDs were specially designed to allow
for an efficient storage and summarization of metrics which is what is
most of the time needed.

For special cases where you need to have all data without any distortion
for a long time, then an ETL process with a RDBMS and some datawharehouse
is better fitted.

The ETL could be as simple as scanning the RRDs periodically and importing
the records into a database, but would be nice if this could be done
directly from gmetad by allowing for hooks during "write RRD" time.

This was indeed, one of the reasons why the python gmetad in trunk had
a modular design, so that a module for doing that could be written if
someone had interest on doing so.

Carlo

Re: [Ganglia-developers] gmetad and rrdtool scalability

Scalable, distributed monitoring system for high-performance computing

Re: [Ganglia-developers] gmetad and rrdtool scalability