Re: [Perfparse-devel] Using indexes with mysql

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Flo,

I wrote your response first.  Unusually my Thunderbird lost the email, 
so here goes again :)

I like your idea of replacing the VARCHAR(75) host, service metric with 
an INTEGER for each.  This will save space on the existing method, 
without loosing the informational context.  The INTEGER UID for these 
can be added directly to their existing tables.  So we can forget my MID 
idea :)

I believe MySQL, in common with most DBMS's, store DATETIME as a number 
internally.  I think MySQL uses a 'unsigned long', MS SQL uses a 
'double'.  There is no real advantage in using out own date format, and 
we loose the ability to specify BETWEEN '2003-10-10' AND '2003-10-11' 
type syntax.  My feeling is that this is ok as it is.

I forgot to mention on this group that the existing warn/crit:

	warn FLOAT,
	crit FLOAT,

Will become:

	warn_lower DOUBLE,
	warn_upper DOUBLE,
	warn_range ENUM ('inner','outer') NOT NULL,
	crit_lower DOUBLE,
	crit_upper DOUBLE,
	crit_range ENUM ('inner','outer') NOT NULL,

An increases in size from 6 to 22 bytes.  (DOUBLE = 4, ALLOW NULL = 1, 
EMUM = 1)

This lead me to observer:
- This will drastically increase the size of the binary data table.
- This will take a long time to edit this table schema.
- This leads to mass duplication.

An example may be a graph of a weeks data, 10080 points.  During this 
time the warn, crit, max and min are unlikely to change.  Therefore can 
be stored in one tuple.  A JOIN for this data:

SELECT * FROM _bin JOIN _bin_extra

Will not degrade performance.  Rather the speed advantages of reading a 
smaller table will result in faster data access.

So using the 'extra' table will lead to smaller data and faster access.

I am really glad to see some DBA's on this group.  What do you think?

Regards, Ben.

Flo Gleixner wrote:

> 
> Hi,
> 
> i did some thoughts about the schema too:
> the mid field is a nice concept, but it makes queries that sort by 
> hostname or service description impossible. I think, if indexes in mysql 
> (or other db's) are efficient, it's the same if you use a "mid" field or 
> a combined index. The extra data table is probably useless, if you 
> assume that most performance is gained if the index search is good. So 
> my proposal would be:
> 
> TABLE perfdata_bin (
>    host_name VARCHAR(75) NOT NULL,
>    service_description VARCHAR(75) NOT NULL,
>    metric VARCHAR(75) NOT NULL,
>    ctime INTEGER NOT NULL,
>    PRIMARY KEY (host_name,service_description,metric,ctime),
>    value DOUBLE,
>    state TINY INT,
>    warn DOUBLE,
>    critical DOUBLE,
>    max DOUBLE,
>    min DOUBLE
> )
> 
> What i would do to save storage:
> 
> create table hosts_names (
>    host_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
>    host_name VARCHAR(75) NOT NULL
> )
> 
> and replace host_name column in perfdata_bin with the host_id. Do joins 
> to get the textual host_name in queries. The same could be done for 
> service_description and metric, since there is a limited number of 
> different values.
> 
> Another proposal would be to use a integer to store timestamps. I'd use 
> unix timestamps (seconds since 1.1.1970) to store dates, since - in my 
> experience - the handling of dates in different databases (mysql, 
> oracle, ...) is not very portable.
> 
> Last i'd propose to store the raw data in the bin table, since i really 
> don't know why this is stored in a extra table. But probably there is a 
> reason i didn't realize.
> 
> Finally i get:
> 
> TABLE perfdata_bin (
>    host_id INTEGER NOT NULL,
>    service_id INTEGER NOT NULL,
>    metric_id INTEGER NOT NULL,
>    ctime INTEGER NOT NULL,
>    PRIMARY KEY (host_id,service_id,metric_id,ctime),
>    value DOUBLE,
>    state TINY INT,
>    warn DOUBLE,
>    critical DOUBLE,
>    max DOUBLE,
>    min DOUBLE,
>    txt_data VARCHAR(255)
> )
> 
> And tables for host_name, service_description and metric.
> 
> Note that this table has almost only fixed sized small columns.
> 
> What do you think?
> 
> Flo Gleixner
> 
> 
> On Thu, 28 Oct 2004, Ben Clewett wrote:
> ...
> 
>>
>> There will be a new version of the schema out soon.  This only has 
>> *one* key. The one key in the new table will be used for the same 
>> reason as Key 1 and Key 4 above.
>>
>> RI.  The new schema will not have RI from the main schema to the data 
>> schema. The reason for this is speed and size.  The advantages of RI 
>> are far out-weighted by the need for speed and small size.
>>
>> The new schema will look *something* like:
>>
>> TABLE perfdata_bin (
>>    mid UNSIGNED INT NOT NULL,
>>    ctime DATETIME NOT NULL,
>>    PRIMARY KEY (mid, ctime),
>>    value DOUBLE,
>>    state TINY INT,
>>    extra_data UNSIGNED INT NOT NULL
>> ) TYPE=innodb;
>>
>> (Field 'extra_data' is a link to a table containing warning, critical, 
>> max and min data.  Field 'mid' = Metric ID = (host, service, metric) 
>> as a unique INT)
>>
>> What do you think?  Will this be better?  I note this has no concept 
>> of a HOST as-is, will this bother you?
>>
>> Regards,
>>
>> Ben.
>>
>>
> 
> 
> -------------------------------------------------------
> This Newsletter Sponsored by: Macrovision For reliable Linux application 
> installations, use the industry's leading
> setup authoring tool, InstallShield X. Learn more and evaluate today. 
> http://clk.atdmt.com/MSI/go/ins0030000001msi/direct/01/
> _______________________________________________
> Perfparse-devel mailing list
> Per...@li...
> https://lists.sourceforge.net/lists/listinfo/perfparse-devel
>