From: Brad N. <BNI...@no...> - 2008-09-03 16:08:00
|
>>> On 9/3/2008 at 9:00 AM, in message <B9AEC8265E202D4BA6540EA32F495790058B8B5A@LDNPCMEU301VEUA.INTRANET.BARCAPINT.COM , <Dan...@ba...> wrote: > > > Currently, gmetad will create RRD files for any new metrics, until disk > space is completely exhausted > > It is not safe to assume that the gmetad administrator is always > communicating with the guy who is rolling out gmond in different > clusters. At a minimum, it might be useful to have a disk space > threshold - if the threshold is exceeded, no new RRDs permitted. > > One thing I notice, with gmetad 3.1.0 and RRD 1.2.15, is that when the > partition is full, RRD files with size = 0 are being created. In other > words, the file is in the directory, but no blocks allocated. > > Subsequent access to that file generates errors, and we have > occasionally seen crashes, although I haven't verified that the crashes > are directly related to this issue. > I'm not sure that this kind of issue is the responsibility of gmetad or not. One of the purposes of Ganglia is to allow an administrator to watch things like available disk space. If the available disk space is critical, then the administrator should be resolving the problem long before gmetad ever approaches an out of disk space situation. The same could be said for CPU and memory utilization as well. Should gmetad detect these conditions and gracefully shutdown before they happen? That would be nice, but I don't think that it is the responsibility of gmetad. Gmetad erroring out in these situations is probably just a symptom of a much larger problem. Just my $.02 Brad |