|
From: Paulo A. G. F. <pau...@pr...> - 2004-10-19 14:34:29
|
Ben, Please see my answers below. Paulo Afonso Graner Fessel Administrador de Ambiente e Sistemas UNIX pau...@pr... OWT Fone: +55 (11) 3038-6554 Fax: +55 (11) 3038-6508 http://www.primesys.com.br > The problem with the range. Do you use a single value for=20 > the WARN and CRIT, or do you use the range format? Can you=20 > please provide an example log line so that we can ensure=20 > there is no misunderstanding of the range and how it should=20 > be interpreted? No, I use single values for WARN and CRIT: 1098196046 psdes01bill /usr DISK OK - free space: /usr 775 = MB (31%): OK /usr=3D775MB;2446;2471;0;2496 > However, this looks like a simple bug, thanks for finding this. >=20 > The State calculation also looks like a simple bug although I=20 > have not yet looked at the code. Again, if you can supply a=20 > line or two of raw log this would be extremely useful. Unfortunately this is not possible at the moment, as I'm not running = 0.101.1 now. I will see if I can setup a test server to reproduce this = here. > Your problem is speed is surprising. I test on a 800MHz=20 > low-memory Linux box, which ensures any speed problems slap=20 > me in the face fast.=20 > The difference between the old and new was about 10% during=20 > my testing.=20 > Can you please give us more information about your setup. =20 > Is you MySQL local or on a remote server? It's a local server. > What method have you chosen to import the log information? I've just told perfparse-log2db to read the service log, as I did other = times with earlier versions of perfparse. It took more or less 4 hours = to parse a 67 MB file. > What commands do you run? The equivalent of perfparse-log2db -r -s -l <pathtoservice.log> > Lastly my apologies for the documentation. This is beta=20 > code, deliberately not yet on version 1.0. We obviously have=20 > some work to do here. Ok. Stand by me for continuing support of PP. []'s Paulo =20 > Paulo Afonso Graner Fessel wrote: >=20 > > Ben, > >=20 > > I've found that perfparse has now the infrastructure to use=20 > ranges of warning/critical data, and the problem is that the=20 > database hasn't changed in order to use these values (as far=20 > as I can see, at least). > >=20 > > Part of the problem is at save_bin_data at storage_mysql.c: > >=20 > > g_string_append_printf(s_SQL, ", %s", > > getSafeD(perf->d[PERF_VALUE_WARN_START])); > > g_string_append_printf(s_SQL, ", %s", > > getSafeD(perf->d[PERF_VALUE_CRIT_START])); > > g_string_append_printf(s_SQL, ", %s)", > > getSafeD(perf->metric_state)); > >=20 > > I've changed it to > >=20 > > g_string_append_printf(s_SQL, ", %s", > > getSafeD(perf->d[PERF_VALUE_WARN_END])); > > g_string_append_printf(s_SQL, ", %s", > > getSafeD(perf->d[PERF_VALUE_CRIT_END])); > > g_string_append_printf(s_SQL, ", %s)", > > getSafeD(perf->metric_state)); > >=20 > > and the data begun to show up in perfdata_service_bin.=20 > However, metric_state values were still wrong and=20 > inconsistent indeed with perfdata_service_raw. In the example=20 > I sent you, the metrics were 2 (CRITICAL) in=20 > perfdata_service_bin; OTOH they were 0 (NORMAL) in=20 > perfdata_service_raw. And the graphs continued not showing=20 > the guides for warning/critical thresholds. At this point, I=20 > rolled back to release 0.100.7. > >=20 > > I've found performance problems also in 0.101.1, as I feel=20 > that it is=20 > > much slower than 0.100.7 when reading the >=20 > serviceperf.log file and putting it into the database. In=20 > 0.100.7 I've got thousands of lines per second; in >=20 > 0.101.1 this number never is greater than 70 lines/sec. And=20 > I'm running on a machine with 512 MB RAM, 2xIntel >=20 > Xeon 3.06 GHz! I've tried to optimize MySQL settings to no=20 > avail, and this was other reason that leaded me to roll back=20 > to 0.100.7. > >=20 > > Also, the documentation is a little confusing in this=20 > release. It took=20 > > me some time until I understood that, with >=20 > --default-perfdata I wouldn't have to use crontab entries=20 > anymore to update the database. There's no mention >=20 > of this in the documentation, and if it sounds obvious for=20 > the developing team, it may be not so clear for the users. > >=20 > > Don't get me wrong: I find that perfparse is THE solution=20 > for gathering performance data for Nagios. However, I feel=20 > that this wasn't the right time to release 0.101.1 because it=20 > clearly lacks polishing and has rough edges on database=20 > architecture and plugin parsing output. > >=20 > > Why don't you add a compilation switch to disable these=20 > burning edge features? It would make life easier to people=20 > that rely on perfparse for data gathering on production systems. > >=20 > > []'s and keep the great work, > >=20 > > Paulo Afonso Graner Fessel > > Administrador de Ambiente e Sistemas UNIX=20 > pau...@pr...=20 > > OWT > > Fone: +55 (11) 3038-6554 > > Fax: +55 (11) 3038-6508 > > http://www.primesys.com.br > > =20 > > =20 > > =20 > > =20 > >=20 > >=20 > >>-----Mensagem original----- > >>De: Ben Clewett [mailto:Be...@cl...] Enviada em:=20 > ter=E7a-feira,=20 > >>19 de outubro de 2004 04:25 > >>Para: Paulo Afonso Graner Fessel > >>Cc: per...@li... > >>Assunto: Re: [Perfparse-users] Warning/critical values not=20 > going into=20 > >>the database... Again > >> > >>Paulo, > >> > >>Can you please send me a sample of your data, I want to=20 > replicate and=20 > >>fix this problem. > >> > >>Regards, > >> > >>Ben Clewett. > >> > >>Paulo Afonso Graner Fessel wrote: > >> > >> > >>>Hello, folks. > >>>=20 > >>>I've just upgraded to 0.101.1 and I'm noticing that > >> > >>warning/critical > >> > >>>values from plugins are not getting into the database again. But=20 > >>>differently (and worst) than last time, seems that the > >> > >>state field of > >> > >>>perfdata_service_bin is also incorrect: > >>>=20 > >>> > >> > >>+-------------+-----------------------------+----------------- > >>------------+---------------------+----------+--------+------- > >>---+--------+ > >> > >>>| host_name | service_description |=20 > >>>metric | ctime | value =20 > >> > >> | warn |=20 > >> > >>>critical | state | > >>> > >> > >>+-------------+-----------------------------+----------------- > >>------------+---------------------+----------+--------+------- > >>---+--------+ > >> > >>>| D01 | Cache Hits |=20 > >>>lib | 2004-10-18 17:35:58 | =20 > >> > >>99.99 | 0=20 > >> > >>>| 0 | 2 | > >>>| D01 | Cache Hits |=20 > >>>buffer | 2004-10-18 17:35:58 | =20 > >> > >>99.95 | 0=20 > >> > >>>| 0 | 2 | > >>> > >> > >>+-------------+-----------------------------+----------------- > >>------------+---------------------+----------+--------+------- > >>---+--------+ > >> > >>>In my definition of this particular service, I want these > >> > >>two metrics > >> > >>>to be as high as possible, with a maximum value of 100%.=20 > >> > >>However, as > >> > >>>you can see, I don't have warning and critical values for > >> > >>this metric, > >> > >>>and also its state as determined by perfparse is 2 (CRITICAL); it=20 > >>>should be 0 actually (NORMAL). > >>>=20 > >>>[]'s > >>>=20 > >>>*Paulo Afonso Graner Fessel* > >>>/Administrador de Ambiente e Sistemas UNIX/=20 > >>>pau...@pr... <mailto:pau...@pr...> > >>>OWT > >>>Fone: +55 (11) 3038-6554 > >>>Fax: +55 (11) 3038-6508 > >>>http://www.primesys.com.br <http://www.primesys.com.br/> > >>>=20 > >>>=20 > >>>=20 > >> > >> > >=20 >=20 >=20 |