From: Yves M. <yme...@li...> - 2004-11-02 11:24:08
|
Comments inline... Proposal for conversion of PerfParse binary data. 1. Brief. =3D=3D=3D=3D=3D=3D=3D=3D=3D Binary data extracted from the performance data from Nagios is stored >>>>>>>> Say that binary data is in fact the performance data, eg the lis= t of "key=3DvalueUOM;ranges..." binary usually means "$=A3*%F|=E9=E8@*%" data :) 2. Requirements. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The functional requirements of the new structure can be listed: 2.1 To hold the metric value as a DOUBLE. Allow NULL to indicate a missing value which should be shown as a space on a graph. >>>>>>>>>>> Not a Number values can be considered as missing values and a= re coded NULL too. So there is no problem with NaN values :) 2.3 Store the time in a format easy to program and portable over different DBMS. >>>>>>>> Use some integer to store a timestamp. This should be enough. However, this is not very simple on sybase (I don't know for other dbms).= So don't try to be too portable ! Keep It Simple and Stupid :) 2.4 Store the maximum and minimum range of the metric with every value. This may be NULL of not recorded. >>>>>>>>>>>> Thinking... Infinite value means that there is no mini or no maxi. Nothing recorded a= lso means no mini or no maxi. So I agree with using NULL to store not recorded value. = However, there will be a choice to do when you read NULL in the database : will you pres= ent it as not recorded or as infinite ? Well, this is presentation and has nothing to d= o here. 2.5 Store the range of warning and critical, as either an inside or outside range. A value of NULL will indicate infinity. A range type of NULL will indicate no value. >>>>>>>>>>>>> Not precise enough. I suggest NULL for infinite (-inf for start range and +inf for end range,= because you cannot have +inf for start range, and you cannot have -inf for end range = :) I suggest the default values when not recorded (0 for start range and +in= f for end range, as specified on the nagiosplug plugins specification page). You also need something to say that the range is inverted (the @ characte= r in the range) 2.6 Offer the correct keys to enable likely queries. Including: 6.1 Extraction of an ordered series of data between two times. 6.2 Extraction of the last entered value for each metric. 6.3 Extraction of data relative to a specific host, service or metric. 2.7 Store the data is an efficient format being small and fast to extract. 2.8 Use ANSI SQL where ever possible. >>>>>>>>>>>>>> ...where ever the performance of insertion and queries are= not reduced too much. You will never be able to write SQL that can be ported on mysql, postgres= ql, oracle, sybase, odbc, sqlite and other ones, even writing 100% ANSI SQL. Or you w= ill write some very poor SQL and program yourself things that SQL dataservers can do for= you. On that reflexion (that I made and that is the origin of storage modules = in perfparse), you will have to create an abstraction layer (or use an existing one) lik= e storage modules for perfparse-db-tools (I will do that one) and CGIs (maybe me to= o, maybe not...). However, when the SQL code is close to ANSI SQL, porting some module to t= alk to another database server is easier. So using ANSI SQL is really recommended. Just = don't drop performances, and don't reinvent the wheel that SQL servers have already = invented :) 2.9 Referential integrity will not be important against the data tables. 2.10 Duplicate data should be impossible to add. >>>>>>>>>>>>> For 2.9 and 2.10, I agree. However, some tools are here to = purge the databases : they can also do some additionnal integrity checking and duplicates removing. 3. Table Schema =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The two new tables will be defined as: >>>>>>>>>>>> I'm too bad in SQL... No comments here. 4. Proposed Use Snapshots =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 4.1 The extraction of data for a graph between two dates can be completed as: SELECT * FROM perfdata_bin JOIN perfdata_bin_extra ON perfdata_bin.bin_extra_id =3D perfdata_bin_extra.id WHERE host =3D 1 AND service =3D 2 AND metric =3D 3 AND ctime BETWEEN '2004-01-01' AND '2004-12-31' ORDER BY ctime; >>>>>>>>>>>>> For information, this is also possible with perfparsed sinc= e 0.103.1 if you enable file_output storage : telnet the perfparsed server and run thi= s : history tm_start tm_end '1' '2' '3' However, the SQL language makes it more flexible. When both are implement= ed, choose the one that best fit your needs ! :) 4.2 Addition of data will be completed as follows: 2.1 Calculation of the hash of the extra data. Eg, MD5(";1;2;3;4") This value will be supplied by the parser. 2.2 A check against the extra data will be completed to find out if this extra data has been entered, against the key 'extra_id0'. If this data has not been entered before, this should take place. In either case, the 'id' of the extra will be registered. >>>>>>>>>>>>> I suggest some option to do that check or not. You can also= remove duplicates every night with perfparse-db-tool. Depends on how much performance you need when inserting data in the database. >>>>>>>>>>>>>>>>> No other comments :) Yves --=20 - Homepage - http://ymettier.free.fr - http://www.logicacmg.com - - GPG key - http://ymettier.free.fr/gpg.txt - - Maitretarot - http://www.nongnu.org/maitretarot/ - - Perfparse - http://perfparse.sf.net/ - |