From: matt m. <ma...@cs...> - 2003-05-29 19:09:14
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 guys- here's another update on g3 with a new demo (if you're interested). http://matt-massie.com/g3/ ganglia-3.0.0.tar.gz (latest snapshot of the source) example.html (a markup of the summary test xml output) today i wrote the on-the-fly summary portion and the g2 compatibility portion of the tree code. now, the tree code allows you to import g2 xml streams (although the data will only be exported in g3 format). building in that compatibility was relatively easy and i needed it to truly test the summary code (i didn't feel like hand-typing a huge xml file :). i tested this code with valgrind (no leaks) and it compiles and runs like a charm on linux and cygwin. the library was able to parse about 60ish metrics on 172ish machines in 4ish clusters with 3ish depth in less than 1ish seconds. that's acceptable and i know of many ways to make it faster (like using a stack for allocing/freeing summary tree nodes instead of malloc/free)... but we'll put that in 3.1.0.. right now i want to focus on getting a stable 3.0.0 release. as things stand now, g3 has three library dependencies. i want to keep the number as low as possible.. they are ... expat, zlib and gnu mp. all very portable. all well-written and all necessary. the gnu mp library is an arbitrary precision math library. since g3 will be used to summarize a huge amount of data.. i needed a math library to handle overflows. g3 will have three basic data types (string, number and float). the number and float values can be as large as your memory will allow. (no more uint8, uint16, int32 etc). i want to keep it simple. if you take a look at the end of http://matt-massie.com/g3/example.html you'll see how g3 summarizes the data on the fly while maintaining hierarchical metric space. it just puts mu/metric tags directly under ou tags with any hosts between. oh yeh, before i forget.. in the latest code completely ignore attribute order per federico's comments. the attribute order doesn't matter.. i have a perfect hash with index names that is used instead. people will likely slice and dice ganglia xml and i don't want to assume they will preserve any attribute order. currently .. string summaries are not handled correctly.. i was more interested in getting the numbers and floats right first. you'll see there is a "samples" attribute on the summary metrics (if samples is not specified it is assumed to be 1). the samples is the number of data points used to get the value.. if we wanted an average we just take value/samples (just like the old sum num attributes.. but i think value samples make more sense.. although.. it doesn't rhyme). - -matt -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQE+1lrJVmIXr0CKtmERAgKhAJ4raepHFl2RK8dtRJRWwL+emmylyQCfTK7w w3ErxYmWmIERIvd4rPTyokM= =1dJw -----END PGP SIGNATURE----- |