From: Ben C. <Be...@cl...> - 2004-10-19 14:21:12
|
Paulo, Thanks for the quality feedback on the new release. I am sorry you have=20 experienced these problems and feel the need to retard back a version.=20 Hopefully we can ensure that future versions are back on track. The problem with the range. Do you use a single value for the WARN and=20 CRIT, or do you use the range format? Can you please provide an example=20 log line so that we can ensure there is no misunderstanding of the range=20 and how it should be interpreted? However, this looks like a simple bug, thanks for finding this. The State calculation also looks like a simple bug although I have not=20 yet looked at the code. Again, if you can supply a line or two of raw=20 log this would be extremely useful. The storage of the range will require a big change to the database. We=20 will be dropping the current binary storage table and replacing it with=20 two new tables. One with just values. One with peripheral information=20 like the Max/Min and Warn/Crit ranges. This will give faster data=20 access and smaller table space. Although at the penalty of a heavy=20 conversion process at that time. In the mean while, I hope an=20 acceptable way can be found, as in the past, to use the available fields=20 to store the required data. Your problem is speed is surprising. I test on a 800MHz low-memory=20 Linux box, which ensures any speed problems slap me in the face fast.=20 The difference between the old and new was about 10% during my testing.=20 Can you please give us more information about your setup. Is you=20 MySQL local or on a remote server? What method have you chosen to=20 import the log information? What commands do you run? Lastly my apologies for the documentation. This is beta code,=20 deliberately not yet on version 1.0. We obviously have some work to do=20 here. Kind regards, Ben Clewett. Paulo Afonso Graner Fessel wrote: > Ben, >=20 > I've found that perfparse has now the infrastructure to use ranges of w= arning/critical data, and the problem is that the database hasn't changed= in order to use these values (as far as I can see, at least). >=20 > Part of the problem is at save_bin_data at storage_mysql.c: >=20 > g_string_append_printf(s_SQL, ", %s", > getSafeD(perf->d[PERF_VALUE_WARN_START])); > g_string_append_printf(s_SQL, ", %s", > getSafeD(perf->d[PERF_VALUE_CRIT_START])); > g_string_append_printf(s_SQL, ", %s)", > getSafeD(perf->metric_state)); >=20 > I've changed it to >=20 > g_string_append_printf(s_SQL, ", %s", > getSafeD(perf->d[PERF_VALUE_WARN_END])); > g_string_append_printf(s_SQL, ", %s", > getSafeD(perf->d[PERF_VALUE_CRIT_END])); > g_string_append_printf(s_SQL, ", %s)", > getSafeD(perf->metric_state)); >=20 > and the data begun to show up in perfdata_service_bin. However, metric_= state values were still wrong and inconsistent indeed with perfdata_servi= ce_raw. In the example I sent you, the metrics were 2 (CRITICAL) in perfd= ata_service_bin; OTOH they were 0 (NORMAL) in perfdata_service_raw. And t= he graphs continued not showing the guides for warning/critical threshold= s. At this point, I rolled back to release 0.100.7. >=20 > I've found performance problems also in 0.101.1, as I feel that it is m= uch slower than 0.100.7 when reading the=20 serviceperf.log file and putting it into the database. In 0.100.7 I've=20 got thousands of lines per second; in 0.101.1 this number never is greater than 70 lines/sec. And I'm running=20 on a machine with 512 MB RAM, 2xIntel Xeon 3.06 GHz! I've tried to optimize MySQL settings to no avail, and=20 this was other reason that leaded me to roll back to 0.100.7. >=20 > Also, the documentation is a little confusing in this release. It took = me some time until I understood that, with=20 --default-perfdata I wouldn't have to use crontab entries anymore to=20 update the database. There's no mention of this in the documentation, and if it sounds obvious for the=20 developing team, it may be not so clear for the users. >=20 > Don't get me wrong: I find that perfparse is THE solution for gathering= performance data for Nagios. However, I feel that this wasn't the right = time to release 0.101.1 because it clearly lacks polishing and has rough = edges on database architecture and plugin parsing output. >=20 > Why don't you add a compilation switch to disable these burning edge fe= atures? It would make life easier to people that rely on perfparse for da= ta gathering on production systems. >=20 > []'s and keep the great work, >=20 > Paulo Afonso Graner Fessel > Administrador de Ambiente e Sistemas UNIX > pau...@pr... > OWT > Fone: +55 (11) 3038-6554 > Fax: +55 (11) 3038-6508 > http://www.primesys.com.br > =20 > =20 > =20 > =20 >=20 >=20 >>-----Mensagem original----- >>De: Ben Clewett [mailto:Be...@cl...]=20 >>Enviada em: ter=E7a-feira, 19 de outubro de 2004 04:25 >>Para: Paulo Afonso Graner Fessel >>Cc: per...@li... >>Assunto: Re: [Perfparse-users] Warning/critical values not=20 >>going into the database... Again >> >>Paulo, >> >>Can you please send me a sample of your data, I want to=20 >>replicate and fix this problem. >> >>Regards, >> >>Ben Clewett. >> >>Paulo Afonso Graner Fessel wrote: >> >> >>>Hello, folks. >>>=20 >>>I've just upgraded to 0.101.1 and I'm noticing that=20 >> >>warning/critical=20 >> >>>values from plugins are not getting into the database again. But=20 >>>differently (and worst) than last time, seems that the=20 >> >>state field of=20 >> >>>perfdata_service_bin is also incorrect: >>>=20 >>> >> >>+-------------+-----------------------------+----------------- >>------------+---------------------+----------+--------+------- >>---+--------+ >> >>>| host_name | service_description |=20 >>>metric | ctime | value =20 >> >> | warn |=20 >> >>>critical | state | >>> >> >>+-------------+-----------------------------+----------------- >>------------+---------------------+----------+--------+------- >>---+--------+ >> >>>| D01 | Cache Hits |=20 >>>lib | 2004-10-18 17:35:58 | =20 >> >>99.99 | 0=20 >> >>>| 0 | 2 | >>>| D01 | Cache Hits |=20 >>>buffer | 2004-10-18 17:35:58 | =20 >> >>99.95 | 0=20 >> >>>| 0 | 2 | >>> >> >>+-------------+-----------------------------+----------------- >>------------+---------------------+----------+--------+------- >>---+--------+ >> >>>In my definition of this particular service, I want these=20 >> >>two metrics=20 >> >>>to be as high as possible, with a maximum value of 100%.=20 >> >>However, as=20 >> >>>you can see, I don't have warning and critical values for=20 >> >>this metric,=20 >> >>>and also its state as determined by perfparse is 2 (CRITICAL); it=20 >>>should be 0 actually (NORMAL). >>>=20 >>>[]'s >>>=20 >>>*Paulo Afonso Graner Fessel* >>>/Administrador de Ambiente e Sistemas UNIX/=20 >>>pau...@pr... <mailto:pau...@pr...> >>>OWT >>>Fone: +55 (11) 3038-6554 >>>Fax: +55 (11) 3038-6508 >>>http://www.primesys.com.br <http://www.primesys.com.br/> >>>=20 >>>=20 >>>=20 >> >> >=20 |