I need advice from someone familiar with awstats code:
I'm having the logs written from remote servers directly
into Postgres DB, where I want to use triggers to parse
the lines and put it into table.
this should allow me skip the parsing phase of awstats,
giving it records of data instead of lines, which should
be much faster.
how would one go about implementing such thing ?
Sorry for bothering you, this was not a good idea,
after some testing, i saw that storage requirements
of database is even greater than plan text file, (tested with postgres),
being space my main concern, i'm no longer interested in this
You'd basically comment out/delete/ignore everything in awstats.pl that has to do with analysing, and then rewrite everything regarding data reading. I almost reckon it would be easier to develop something new from scratch...
Or if you could somehow put an interface inbetween so that AWStats thinks it is reading the file, but the results are fetched from the DB by something in-between. Not sure from the top of my head how to accomplish that though.
Or actually - just struck me - the easiest way might be to schedule a task which reads the database and creates data files in a format AWStats can read. You avoid (or rather, recreate) the analysis part, but wouldn't have to touch awstats code itself.
Then again, you would end up with the same data in both the database and in the data files.
You wrote: "this should allow me skip the parsing phase of awstats, giving it records of data instead of lines, which should be much faster."
I doubt this is a true statement. I would be interested to see figures that establish which method is the festest.
thank you guys, the whole idea is to avoid parsing strings,
and I'm pretty sure that is the slowest part of the system, or
at least should be.
its obvious that splitting each line into fields, than
convert each field to its original type (date, integer,..)
is slowest than not doing so.
(and get it already splited and in the correct format)
more than that, one of my objectives is to run awstats each
time with different records filters (by host, by browser, by referrers, etc),
to achieve interactivity (see http://mawstats.lingnu.com)
it would be much slower to filter this in awstats, after parsing each line,
where I can get it very fast from the DB engine, and pass only the
records i want to awstats.