obsessive-compulsive Mailing List for Obsessive Website Statistics
Status: Beta
Brought to you by:
virtuald
You can subscribe to this list here.
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
|
Nov
(2) |
Dec
|
---|
From: Dustin S. <du...@vi...> - 2007-11-21 14:42:47
|
Hey, OWS 0.8.0.5 was released today, with a handful of bugfixes and some plugin improvements. In addition, there are security fixes included also. The most important (and useful) improvement is that you can manually specify WHERE parameters for all plugins, not just the manual analysis plugin. A breaking change is that the interface for limit plugins has been changed, so that a plugin can implement iLimitPlugin and iFilterPlugin at the same time. All users are highly encouraged to upgrade. Dustin -- Innovation is just a problem away. |
From: Dustin S. <du...@vi...> - 2007-11-12 05:58:35
|
As an aside, my page ( http://www.virtualroadside.com/blog/index.php/2007/11/11/inspired-by-xkcd-mbr-love-note/ ) just got on a bunch of blogs and the front page of digg for awhile (though, it was a Sunday night, so not as bad as it could be)... but on average its been taking OWS an hour to catch up with the previous hour on my pentium III. *sighs*... guess I should work on some performance related stuff after school gets out in December. Dustin -- Innovation is just a problem away. |
From: Dustin S. <du...@vi...> - 2007-09-27 04:53:39
|
John, I got it fixed, probably. At least, it appears to work correctly on the logfile lines you gave me. The code in SVN has the fix. I only played around with it to make the existing functionality/fields worked, and did not add any of the other dimensions that may be useful that are defined in the IIS logfiles. Let me know if that works for ya. Thanks. Dustin |
From: John R. <JRi...@cc...> - 2007-09-26 19:57:52
|
Dustin, Thanks for the in-depth summary of the applicable scripts!! Yes, changing the B to b did make a difference, but I still got the following: Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: \Inetpub\wwwroot\ows\include\analysis.inc.php at line 465 Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: \Inetpub\wwwroot\ows\include\analysis.inc.php at line 465 Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: \Inetpub\wwwroot\ows\include\analysis.inc.php at line 465 Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: \Inetpub\wwwroot\ows\include\analysis.inc.php at line 465 Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: \Inetpub\wwwroot\ows\include\analysis.inc.php at line 465 ......Which led me to the idea that that I needed more fields to handle my IIS formatted data. I will look into creating the "ows_iis_analysis.php" and see were that gets me. I will update my results. Please let me know if you have any questions or comments. =20 John A. Ridgway College Center for Library Automation Systems Analyst II jri...@cc... -----Original Message----- From: Dustin Spicuzza [mailto:du...@vi...]=20 Sent: Wednesday, September 26, 2007 3:26 PM To: John Ridgway Cc: Michael Papile; obs...@li... Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics Does this mean that changing the B to b allows you to at least read in=20 most of the data (sans those other fields you use) into OWS's tables?= If so, then thats a good thing. :) What you are describing is expected behavior. See, OWS doesn't create=20 fields/dimensions based on the apache log fields, but creates them as=20 defined in the plugins (which, were designed for apache log fields). For example, the standard / useful apache fields are defined in=20 ows_functions.php, in the function define_dimensions(). The plugin=20 performs analysis on each field, and defines attributes for some=20 dimensions in useful ways (such as hour for time, internal/external=20 referrer for the referrer field... etc). You'll notice even some apache=20 fields were ignored, because (in my opinion), they would take up extra=20 space and weren't useful. To use the additional attributes, we'd have to either add the fields to=20 ows_functions.php (which is a bad idea, because then it defines all of=20 this extra data for everyone, and wastes space -- unless there was an=20 easy way to switch it on or off), or move it to its own plugin, like=20 ows_iis_analysis.php (better idea). Another possible idea is to dynamically define dimensions based off the=20 logfields (which, basically means each seperate field in the logfile=20 would be seperated out into its own seperate dimension). The downside=20 here is that it wouldn't be intelligent enough to define attributes per=20 field, but in many cases that may not be strictly required anyways. The bad thing about additional dimensions/attributes is that the=20 existing filter plugins ("Aggregate Analysis", "Heatmap Analysis", etc)=20 that are used by OWS to create user output don't use 'new' dimensions,=20 except for the manual analysis plugin, which can use all dimensions and=20 attributes. But of course, you can create your own plugins and they will work with the new defined dimensions seamlessly. I'm interested in=20 expanding the existing plugins to make it so that they can use new=20 dimensions/attributes, but it doesn't make sense for all=20 dimensions/attributes, so there needs to be a way to specify to use=20 particular fields for particular purposes. The best way to do this is=20 probably add something to the dimension/attribute field that the=20 analysis plugins can use somehow.. Dustin John Ridgway wrote: > Dustin, > > I think the issues are larger than just that replacement... > > Looking at analysis.inc.php and the database created with > installer.php, the fields I am using just about double the fields > created or expected by the current set of scripts. > > <analysis.inc.php> > > 'Remote-Host' =3D> '', > 'Remote-User' =3D> '', > 'Date' =3D> '', > 'Time' =3D> '', > 'Method' =3D> '', > 'Request' =3D> '', // Original > Fields used by OWS > 'Protocol' =3D> '', > 'Status' =3D> '', > 'Bytes-Sent' =3D> '', > 'Referrer' =3D> '', > 'User-Agent' =3D> '', > > 'Service' =3D> '', //All entries from here down > add by John Ridgway > 'Source_Server' =3D> '', > 'Source-IP' =3D> '', > 'URL_Path' =3D> '', > 'Port' =3D> '', > 'Cookie' =3D> '', > 'Return-Statu' =3D> '', > 'Sub-Statuss' =3D> '', > 'Bytes-Recieved' =3D> '', > 'Time-Taken' =3D> '' > > I think my problems now lie in this area.... > > > Let me know if this seems reasonable. > =20 > John A. Ridgway > College Center for Library Automation > Systems Analyst II > jri...@cc... > > -----Original Message----- > From: Dustin Spicuzza [mailto:du...@vi...]=20 > Sent: Monday, September 24, 2007 8:24 PM > To: John Ridgway > Cc: Michael Papile; obs...@li... > Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics > > John, > > I haven't gotten around to fixing this yet, but you can try substituting > > %b for %B and I think that will fix your problem... too much stuff going > > on for me. > > Dustin > =20 --=20 Innovation is just a problem away |
From: Dustin S. <du...@vi...> - 2007-09-26 19:25:57
|
Does this mean that changing the B to b allows you to at least read in most of the data (sans those other fields you use) into OWS's tables? If so, then thats a good thing. :) What you are describing is expected behavior. See, OWS doesn't create fields/dimensions based on the apache log fields, but creates them as defined in the plugins (which, were designed for apache log fields). For example, the standard / useful apache fields are defined in ows_functions.php, in the function define_dimensions(). The plugin performs analysis on each field, and defines attributes for some dimensions in useful ways (such as hour for time, internal/external referrer for the referrer field... etc). You'll notice even some apache fields were ignored, because (in my opinion), they would take up extra space and weren't useful. To use the additional attributes, we'd have to either add the fields to ows_functions.php (which is a bad idea, because then it defines all of this extra data for everyone, and wastes space -- unless there was an easy way to switch it on or off), or move it to its own plugin, like ows_iis_analysis.php (better idea). Another possible idea is to dynamically define dimensions based off the logfields (which, basically means each seperate field in the logfile would be seperated out into its own seperate dimension). The downside here is that it wouldn't be intelligent enough to define attributes per field, but in many cases that may not be strictly required anyways. The bad thing about additional dimensions/attributes is that the existing filter plugins ("Aggregate Analysis", "Heatmap Analysis", etc) that are used by OWS to create user output don't use 'new' dimensions, except for the manual analysis plugin, which can use all dimensions and attributes. But of course, you can create your own plugins and they will work with the new defined dimensions seamlessly. I'm interested in expanding the existing plugins to make it so that they can use new dimensions/attributes, but it doesn't make sense for all dimensions/attributes, so there needs to be a way to specify to use particular fields for particular purposes. The best way to do this is probably add something to the dimension/attribute field that the analysis plugins can use somehow.. Dustin John Ridgway wrote: > Dustin, > > I think the issues are larger than just that replacement... > > Looking at analysis.inc.php and the database created with > installer.php, the fields I am using just about double the fields > created or expected by the current set of scripts. > > <analysis.inc.php> > > 'Remote-Host' => '', > 'Remote-User' => '', > 'Date' => '', > 'Time' => '', > 'Method' => '', > 'Request' => '', // Original > Fields used by OWS > 'Protocol' => '', > 'Status' => '', > 'Bytes-Sent' => '', > 'Referrer' => '', > 'User-Agent' => '', > > 'Service' => '', //All entries from here down > add by John Ridgway > 'Source_Server' => '', > 'Source-IP' => '', > 'URL_Path' => '', > 'Port' => '', > 'Cookie' => '', > 'Return-Statu' => '', > 'Sub-Statuss' => '', > 'Bytes-Recieved' => '', > 'Time-Taken' => '' > > I think my problems now lie in this area.... > > > Let me know if this seems reasonable. > > John A. Ridgway > College Center for Library Automation > Systems Analyst II > jri...@cc... > > -----Original Message----- > From: Dustin Spicuzza [mailto:du...@vi...] > Sent: Monday, September 24, 2007 8:24 PM > To: John Ridgway > Cc: Michael Papile; obs...@li... > Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics > > John, > > I haven't gotten around to fixing this yet, but you can try substituting > > %b for %B and I think that will fix your problem... too much stuff going > > on for me. > > Dustin > -- Innovation is just a problem away |
From: John R. <JRi...@cc...> - 2007-09-26 13:36:48
|
Dustin, I think the issues are larger than just that replacement... Looking at analysis.inc.php and the database created with installer.php, the fields I am using just about double the fields created or expected by the current set of scripts. <analysis.inc.php> 'Remote-Host' =3D> '', 'Remote-User' =3D> '', 'Date' =3D> '', 'Time' =3D> '', 'Method' =3D> '', 'Request' =3D> '', // Original Fields used by OWS 'Protocol' =3D> '', 'Status' =3D> '', 'Bytes-Sent' =3D> '', 'Referrer' =3D> '', 'User-Agent' =3D> '', 'Service' =3D> '', //All entries from here down add by John Ridgway 'Source_Server' =3D> '', 'Source-IP' =3D> '', 'URL_Path' =3D> '', 'Port' =3D> '', 'Cookie' =3D> '', 'Return-Statu' =3D> '', 'Sub-Statuss' =3D> '', 'Bytes-Recieved' =3D> '', 'Time-Taken' =3D> '' I think my problems now lie in this area.... Let me know if this seems reasonable. =20 John A. Ridgway College Center for Library Automation Systems Analyst II jri...@cc... -----Original Message----- From: Dustin Spicuzza [mailto:du...@vi...]=20 Sent: Monday, September 24, 2007 8:24 PM To: John Ridgway Cc: Michael Papile; obs...@li... Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics John, I haven't gotten around to fixing this yet, but you can try substituting %b for %B and I think that will fix your problem... too much stuff going on for me. Dustin |
From: Dustin S. <du...@vi...> - 2007-09-25 00:23:45
|
John, I haven't gotten around to fixing this yet, but you can try substituting %b for %B and I think that will fix your problem... too much stuff going on for me. Dustin |
From: John R. <JRi...@cc...> - 2007-09-22 02:58:16
|
Hey guys, No big deal or anything, but I did manage to replace the regex expression within the apache_log_parser.php between lines 210-220: Here is what I used: //$regex_element =3D '\[([^:]+):(\d+:\d+:\d+ [^\]]+)\]'; $regex_element =3D '(\d+-\d+-\d+) (\d+:\d+:\d+)'; This seems to get me to the same errors I emailed last-time: > Lines processed so far: 0 : Fri Sep 21 16:13:16 EDT 2007 (0s, 0s total) > Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: > \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 > Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: Etc...etc... I also was then able to comment out the additional code I used in the analysis.inc.php that converted my IIS date/time syntax to apache: //$matches =3D explode(' ',$line); //$date =3D array_shift($matches); //$time =3D array_shift($matches); //$rest_of_line =3D implode('',$matches); //$date =3D str_replace('-','',$date); //$timestamp =3D strtotime($date); //$date =3D date ( 'd/M/Y' ,$timestamp ); //$line =3D "[$date:$time -400] $rest_of_line"; Don't know that this really means anything, but it seems to now understand the "native" IIS logs I am generating.... Any thoughts?? P.S. The small php regex testing sample code that Michael emailed me helped A LOT in getting me to this point....Plus I'm getting to know this code pretty well.... Thanks for the support!!! John A. Ridgway Systems Analyst II College Center for Library Automation 850-922-6044 jri...@cc... -----Original Message----- From: Dustin Spicuzza [mailto:du...@vi...]=20 Sent: Friday, September 21, 2007 5:14 PM To: John Ridgway Cc: Michael Papile; obs...@li... Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics Woo... more bugs. Umm... John Ridgway wrote: > Guys, > > Thanks again for stepping through this with me. Michael, The logfile you used as an example changes the IIS date/time format to Apache... With this being said, I kept my changes to analysis.inc.php (Michael's suggestion from earlier email...) > > $matches =3D explode(' ',$line); > $date =3D array_shift($matches); > $time =3D array_shift($matches); > $rest_of_line =3D implode(' ',$matches); > $date =3D str_replace('-','',$date); > $timestamp =3D strtotime($date); > $date =3D date ( 'd/M/Y' ,$timestamp ); > //return "[$date:$time -400] $rest_of_line"; > $line =3D "[$date:$time -400] $rest_of_line"; > > All this does is change the date/time syntax in the $line variable, the actual log has not changed (is this an issue??) > =20 This is not an issue, but its redundant and will add to the log=20 processing time. What my point was previously is that strtotime() will=20 successfully convert the 2007-01-07 style string... so you *shouldn't*=20 have to convert it. If you example the source code for OWS, you'll=20 notice that it later removes the / from the string and then runs=20 strtotime on it... as long as the matched parameters can be converted= by strtotime(), you're good. > --->>Now there is a new issue:<<---- > > D:\Inetpub\wwwroot\ows\scripts>php upload_log.php d:\logs\Jul-Sep\ex070701.log cclaflorida.org debug > > > =20 >>>> Initializing analysis engine................done. >>>> Initializing analysis plugins.....done. >>>> Initializing rejection plugins...done. >>>> Initializing log parser... done. >>>> =20 > > =3D=3D=3D=3D Debug Info =3D=3D=3D=3D > Log Format: > > %t %P %v %A %m %U %q %p %u %a %H %{User-Agent}i %{Cookie}i %{Referer}i %h %s %>s %N %B %N %T > > Using regular expression: > > /^\[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\s*\S*\s*) (\S*) (\S*) > (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*)$/ > > Matching fields (case-sensitive fields present in the log_format string): > > Array > ( > [0] =3D> Date > [1] =3D> Time > [2] =3D> Process-Id > [3] =3D> Server-Name > [4] =3D> Local-IP > [5] =3D> Request-Method > [6] =3D> Request-Path > [7] =3D> Query-String > [8] =3D> Port > [9] =3D> Remote-User > [10] =3D> Remote-IP > [11] =3D> Request-Protocol > [12] =3D> User-Agent > [13] =3D> Cookie > [14] =3D> Referer > [15] =3D> Remote-Host > [16] =3D> Status > [17] =3D> Status > [18] =3D> No-Data > [19] =3D> Bytes-Sent-X > [20] =3D> No-Data > [21] =3D> Time-Taken-S > ) > > =3D=3D=3D=3D > > =20 >>>> Closing analysis engine...............done >>>> =20 > > D:\Inetpub\wwwroot\ows\scripts>php upload_log.php d:\logs\Jul-Sep\ex070701.log cclaflorida.org > > =20 >>>> Initializing analysis engine................done. >>>> Initializing analysis plugins.....done. >>>> Initializing rejection plugins...done. >>>> Initializing log parser... done. >>>> 0 existing lines for cclaflorida_org... >>>> =20 > No previous data found. Uploading from beginning... > > =20 >>>> Now parsing given logfile... >>>> =20 > > =20 >>>> Starting processing stage 1: >>>> Running pre-analysis.....done. >>>> =20 > Lines processed so far: 0 : Fri Sep 21 16:13:16 EDT 2007 (0s, 0s total) > Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: > \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 > Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: > \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 > Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: > \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 > Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: > \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 > Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: > \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 > Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: > \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 > Error: SQL Message: Incorrect integer value: '' for column 'bytes' at row 1 > > Last SQL Query: > INSERT INTO cclaflorida_org_bytes (bytes_id,bytes) VALUES (1,'') > > > Error occurred: > at line 320 of db.inc.php in function db_is_valid_result() > called from line 418 of analysis.inc.php in function Analysis->Process() > called from line 917 of analysis.inc.php in function Analysis->analyzeFile() > called from line 295 of upload_log.php > > Error: Error doing batch insert on dimension 'bytes' > Error occurred: > at line 419 of analysis.inc.php in function Analysis->Process() > called from line 917 of analysis.inc.php in function Analysis->analyzeFile() > called from line 295 of upload_log.php > > Error: Error parsing line in file. > Error occurred: > at line 920 of analysis.inc.php in function Analysis->analyzeFile() > called from line 295 of upload_log.php > > > > =20 >>>> Ended at Fri Sep 21 16:13:16 EDT 2007 >>>> =20 > > =20 >>>> Closing analysis engine...............done >>>> =20 > > D:\Inetpub\wwwroot\ows\scripts> > > Thanks again!!!!! > > > > =20 > John A. Ridgway > College Center for Library Automation > Systems Analyst II > jri...@cc... > =20 Seems like theres a bug in the cache code... d-oh! I'm almost positive=20 that its the result of something expecting a dimension and it not being=20 there... oops. This is because you used Bytes-Sent-X instead of the one=20 apache uses... I'll have to debug it tonight, and make it translate=20 between the two or something. Thanks for the bug report! Dustin --=20 Innovation is just a problem away |
From: Dustin S. <du...@vi...> - 2007-09-21 21:13:57
|
Woo... more bugs. Umm... John Ridgway wrote: > Guys, > > Thanks again for stepping through this with me. Michael, The logfile you used as an example changes the IIS date/time format to Apache... With this being said, I kept my changes to analysis.inc.php (Michael's suggestion from earlier email...) > > $matches = explode(' ',$line); > $date = array_shift($matches); > $time = array_shift($matches); > $rest_of_line = implode(' ',$matches); > $date = str_replace('-','',$date); > $timestamp = strtotime($date); > $date = date ( 'd/M/Y' ,$timestamp ); > //return "[$date:$time -400] $rest_of_line"; > $line = "[$date:$time -400] $rest_of_line"; > > All this does is change the date/time syntax in the $line variable, the actual log has not changed (is this an issue??) > This is not an issue, but its redundant and will add to the log processing time. What my point was previously is that strtotime() will successfully convert the 2007-01-07 style string... so you *shouldn't* have to convert it. If you example the source code for OWS, you'll notice that it later removes the / from the string and then runs strtotime on it... as long as the matched parameters can be converted by strtotime(), you're good. > --->>Now there is a new issue:<<---- > > D:\Inetpub\wwwroot\ows\scripts>php upload_log.php d:\logs\Jul-Sep\ex070701.log cclaflorida.org debug > > > >>>> Initializing analysis engine................done. >>>> Initializing analysis plugins.....done. >>>> Initializing rejection plugins...done. >>>> Initializing log parser... done. >>>> > > ==== Debug Info ==== > Log Format: > > %t %P %v %A %m %U %q %p %u %a %H %{User-Agent}i %{Cookie}i %{Referer}i %h %s %>s %N %B %N %T > > Using regular expression: > > /^\[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\s*\S*\s*) (\S*) (\S*) > (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*)$/ > > Matching fields (case-sensitive fields present in the log_format string): > > Array > ( > [0] => Date > [1] => Time > [2] => Process-Id > [3] => Server-Name > [4] => Local-IP > [5] => Request-Method > [6] => Request-Path > [7] => Query-String > [8] => Port > [9] => Remote-User > [10] => Remote-IP > [11] => Request-Protocol > [12] => User-Agent > [13] => Cookie > [14] => Referer > [15] => Remote-Host > [16] => Status > [17] => Status > [18] => No-Data > [19] => Bytes-Sent-X > [20] => No-Data > [21] => Time-Taken-S > ) > > ==== > > >>>> Closing analysis engine...............done >>>> > > D:\Inetpub\wwwroot\ows\scripts>php upload_log.php d:\logs\Jul-Sep\ex070701.log cclaflorida.org > > >>>> Initializing analysis engine................done. >>>> Initializing analysis plugins.....done. >>>> Initializing rejection plugins...done. >>>> Initializing log parser... done. >>>> 0 existing lines for cclaflorida_org... >>>> > No previous data found. Uploading from beginning... > > >>>> Now parsing given logfile... >>>> > > >>>> Starting processing stage 1: >>>> Running pre-analysis.....done. >>>> > Lines processed so far: 0 : Fri Sep 21 16:13:16 EDT 2007 (0s, 0s total) > Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: > \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 > Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: > \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 > Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: > \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 > Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: > \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 > Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: > \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 > Internal Error: array_key_exists(): The first argument should be either a string or an integer at D: > \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 > Error: SQL Message: Incorrect integer value: '' for column 'bytes' at row 1 > > Last SQL Query: > INSERT INTO cclaflorida_org_bytes (bytes_id,bytes) VALUES (1,'') > > > Error occurred: > at line 320 of db.inc.php in function db_is_valid_result() > called from line 418 of analysis.inc.php in function Analysis->Process() > called from line 917 of analysis.inc.php in function Analysis->analyzeFile() > called from line 295 of upload_log.php > > Error: Error doing batch insert on dimension 'bytes' > Error occurred: > at line 419 of analysis.inc.php in function Analysis->Process() > called from line 917 of analysis.inc.php in function Analysis->analyzeFile() > called from line 295 of upload_log.php > > Error: Error parsing line in file. > Error occurred: > at line 920 of analysis.inc.php in function Analysis->analyzeFile() > called from line 295 of upload_log.php > > > > >>>> Ended at Fri Sep 21 16:13:16 EDT 2007 >>>> > > >>>> Closing analysis engine...............done >>>> > > D:\Inetpub\wwwroot\ows\scripts> > > Thanks again!!!!! > > > > > John A. Ridgway > College Center for Library Automation > Systems Analyst II > jri...@cc... > Seems like theres a bug in the cache code... d-oh! I'm almost positive that its the result of something expecting a dimension and it not being there... oops. This is because you used Bytes-Sent-X instead of the one apache uses... I'll have to debug it tonight, and make it translate between the two or something. Thanks for the bug report! Dustin -- Innovation is just a problem away |
From: John R. <JRi...@cc...> - 2007-09-21 20:20:01
|
Guys, Thanks again for stepping through this with me. Michael, The logfile= you used as an example changes the IIS date/time format to Apache...= With this being said, I kept my changes to analysis.inc.php (Michael's= suggestion from earlier email...) $matches =3D explode(' ',$line); $date =3D array_shift($matches); $time =3D array_shift($matches); $rest_of_line =3D implode(' ',$matches); $date =3D str_replace('-','',$date); $timestamp =3D strtotime($date); $date =3D date ( 'd/M/Y' ,$timestamp ); //return "[$date:$time -400] $rest_of_line"; $line =3D "[$date:$time -400] $rest_of_line"; All this does is change the date/time syntax in the $line variable, the= actual log has not changed (is this an issue??) --->>Now there is a new issue:<<---- D:\Inetpub\wwwroot\ows\scripts>php upload_log.php d:\logs\Jul-Sep\ex0707= 01.log cclaflorida.org debug >>> Initializing analysis engine................done. >>> Initializing analysis plugins.....done. >>> Initializing rejection plugins...done. >>> Initializing log parser... done. =3D=3D=3D=3D Debug Info =3D=3D=3D=3D Log Format: %t %P %v %A %m %U %q %p %u %a %H %{User-Agent}i %{Cookie}i %{Referer}i= %h %s %>s %N %B %N %T Using regular expression: /^\[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) (\S*) (\S*)= (\S*) (\s*\S*\s*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*)$/ Matching fields (case-sensitive fields present in the log_format string): Array ( [0] =3D> Date [1] =3D> Time [2] =3D> Process-Id [3] =3D> Server-Name [4] =3D> Local-IP [5] =3D> Request-Method [6] =3D> Request-Path [7] =3D> Query-String [8] =3D> Port [9] =3D> Remote-User [10] =3D> Remote-IP [11] =3D> Request-Protocol [12] =3D> User-Agent [13] =3D> Cookie [14] =3D> Referer [15] =3D> Remote-Host [16] =3D> Status [17] =3D> Status [18] =3D> No-Data [19] =3D> Bytes-Sent-X [20] =3D> No-Data [21] =3D> Time-Taken-S ) =3D=3D=3D=3D >>> Closing analysis engine...............done D:\Inetpub\wwwroot\ows\scripts>php upload_log.php d:\logs\Jul-Sep\ex0707= 01.log cclaflorida.org >>> Initializing analysis engine................done. >>> Initializing analysis plugins.....done. >>> Initializing rejection plugins...done. >>> Initializing log parser... done. >>> 0 existing lines for cclaflorida_org... No previous data found. Uploading from beginning... >>> Now parsing given logfile... >>> Starting processing stage 1: >>> Running pre-analysis.....done. Lines processed so far: 0 : Fri Sep 21 16:13:16 EDT 2007 (0s, 0s total) Internal Error: array_key_exists(): The first argument should be either= a string or an integer at D: \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 Internal Error: array_key_exists(): The first argument should be either= a string or an integer at D: \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 Internal Error: array_key_exists(): The first argument should be either= a string or an integer at D: \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 Internal Error: array_key_exists(): The first argument should be either= a string or an integer at D: \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 Internal Error: array_key_exists(): The first argument should be either= a string or an integer at D: \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 Internal Error: array_key_exists(): The first argument should be either= a string or an integer at D: \Inetpub\wwwroot\ows\include\analysis.inc.php at line 455 Error: SQL Message: Incorrect integer value: '' for column 'bytes' at= row 1 Last SQL Query: INSERT INTO cclaflorida_org_bytes (bytes_id,bytes) VALUES (1,'') Error occurred: at line 320 of db.inc.php in function db_is_valid_result() called from line 418 of analysis.inc.php in function Analysis->P= rocess() called from line 917 of analysis.inc.php in function Analysis->a= nalyzeFile() called from line 295 of upload_log.php Error: Error doing batch insert on dimension 'bytes' Error occurred: at line 419 of analysis.inc.php in function Analysis->Process() called from line 917 of analysis.inc.php in function Analysis->a= nalyzeFile() called from line 295 of upload_log.php Error: Error parsing line in file. Error occurred: at line 920 of analysis.inc.php in function Analysis->analyzeFil= e() called from line 295 of upload_log.php >>> Ended at Fri Sep 21 16:13:16 EDT 2007 >>> Closing analysis engine...............done D:\Inetpub\wwwroot\ows\scripts> Thanks again!!!!! =20 John A. Ridgway College Center for Library Automation Systems Analyst II jri...@cc... -----Original Message----- From: Michael Papile [mailto:p...@pa...]=20 Sent: Friday, September 21, 2007 3:20 PM To: John Ridgway Cc: Dustin Spicuzza; obs...@li... Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics Dustin, here is what I sent to john about the same time that you said=20 the same thing :) Hello, I do not have OWS running here, but I see the problem in the regex. in=20 the user agent, referrer and the cookie, remove the \" 's . You log does=20 not have these things in quotes. so it should be %t %P %v %A %m %U %q %p %u %a %H %{User-Agent}i=20 %{Cookie}i %{Referrer}i %h %s %>s %N %B %N %T note "referrer" is spelt wrong in the previous string (I used to do that=20 all the time too) Here is a little test I wrote that works. <?php $regex =3D ' /^\[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*)=20 (\S*) (\S*) (\S*) (\s*\S*\s*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*)=20 (\S*) (\S*) (\S*) (\S*) (\S*)$/'; $string =3D "[01/Jul/2007:04:00:33 -400] W3SVC94731577 COBRA 192.168.2.1= 05=20 GET /images/banner_master_01.gif - 80 - 76.20.156.11 HTTP/1.1=20 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322)=20 ASPSESSIONIDAAQATSBD=3DAEIGNNADIJNMJDCFEEHDLNLH=20 http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 1205 354 78"; preg_match($regex,$string,$matches); print var_dump($matches); ~ So make sure your regex in the debug message matches that. If that is=20 so, you will not get bad lines. Micahel P. John Ridgway wrote: > > Hey guys, > > I've attempted to make the changes that have been bounced=20 > around and the output of that testing follows below: > > > > D:\Inetpub\wwwroot\ows\scripts>php upload_log.php=20 > d:\logs\Jul-Sep\ex070701.log cclaflorida.org debug > > > >>> Initializing analysis engine................done. > >>> Initializing analysis plugins.....done. > >>> Initializing rejection plugins...done. > >>> Initializing log parser... done. > > =3D=3D=3D=3D Debug Info =3D=3D=3D=3D > Log Format: > > %t %P %v %A %m %U %q %p %u %a %H \"%{User-Agent}i\" \"%{Cookie}i\"=20 > \"%{Referer}i\" %h %s %>s %N %B %N %T -- %N =3D added as 'No-Data'= in=20 > apache_log_parser.php > > Using regular expression: > > /^\[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) (\S*) (\S*)=20 > (\S*) (\s*\S*\s*) (\S*) (\S*) \"(.*?)\" \"(.*?)\" \"(.*?)\" (\S*)=20 > (\S*) (\S*) (\S*) (\S*) (\S*) (\S*)$/ > > Matching fields (case-sensitive fields present in the log_format strin= g): > > Array > ( > [0] =3D> Date > [1] =3D> Time > [2] =3D> Process-Id > [3] =3D> Server-Name > [4] =3D> Local-IP > [5] =3D> Request-Method > [6] =3D> Request-Path > [7] =3D> Query-String > [8] =3D> Port > [9] =3D> Remote-User > [10] =3D> Remote-IP > [11] =3D> Request-Protocol > [12] =3D> User-Agent > [13] =3D> Cookie > [14] =3D> Referer > [15] =3D> Remote-Host > [16] =3D> Status > [17] =3D> Status > [18] =3D> No-Data > [19] =3D> Bytes-Sent-X > [20] =3D> No-Data > [21] =3D> Time-Taken-S > ) > > =3D=3D=3D=3D > > >>> Closing analysis engine...............done > > D:\Inetpub\wwwroot\ows\scripts>php upload_log.php=20 > d:\logs\Jul-Sep\ex070701.log cclaflorida.org > > >>> Initializing analysis engine................done. > >>> Initializing analysis plugins.....done. > >>> Initializing rejection plugins...done. > >>> Initializing log parser... done. > >>> 0 existing lines for cclaflorida_org... > No previous data found. Uploading from beginning... > > >>> Now parsing given logfile... > > >>> Starting processing stage 1: > >>> Running pre-analysis.....done. > *Bad line found on 1*: *[01/Jul/2007:04:00:33 -400]* W3SVC94731577=20 > COBRA 192.168.2.105 GET /index.asp - *<--- I got the Date/Time portion=20 > of the log to convert to the Apache syntax using Michael's code=20 > suggestion. > *80 - 76.20.156.11 HTTP/1.1=20 > Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322)=20 > - h *<--- Still sees logfiles as "Bad Line".... > *ttp://search.yahoo.com/search?p=3Dflordia+community+colleges&fr=3Dyfp= -t-501&toggle=3D1&cop=3Dmss&ei=3DUTF-8=20 > ww > w.cclaflorida.org 200 0 0 13102 343 343 > > *Bad line found on 2*: [01/Jul/2007:04:00:33 -400] W3SVC94731577 COBRA=20 > 192.168.2.105 GET /images/cclab > ak.gif - 80 - 76.20.156.11 HTTP/1.1=20 > Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1. > 4322) ASPSESSIONIDAAQATSBD=3DAEIGNNADIJNMJDCFEEHDLNLH=20 > http://www.cclaflorida.org/ www.cclaflorida.org > 200 0 0 296 345 93 > > *Bad line found on 3*: [01/Jul/2007:04:00:33 -400] W3SVC94731577 COBRA=20 > 192.168.2.105 GET /images/banne > r_master_01.gif - 80 - 76.20.156.11 HTTP/1.1=20 > Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET > +CLR+1.1.4322) ASPSESSIONIDAAQATSBD=3DAEIGNNADIJNMJDCFEEHDLNLH=20 > http://www.cclaflorida.org/ www.cclaflo > rida.org 200 0 0 1205 354 78 > > 0 good lines > 3 bad lines > > Cache stats: 0h 0m (0) > 0 SQL queries/inserts > >>> Running post-analysis.....done. > > > >>> Ended at Fri Sep 21 14:23:15 EDT 2007 > > >>> Closing analysis engine...............done > > D:\Inetpub\wwwroot\ows\scripts> > > =20 > Any suggestions??? Thanks! > > > > > John A. Ridgway > College Center for Library Automation > Systems Analyst II > jri...@cc... > > -----Original Message----- > From: Dustin Spicuzza [mailto:du...@vi...] > Sent: Thursday, September 20, 2007 12:35 AM > To: Michael Papile > Cc: John Ridgway; obs...@li... > Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics > > Oh I missed the second part of your comment. I'm almost positive I don= 't > use that function anyways. But, I didn't write that class either.... > > Dustin > > Michael Papile wrote: > > It appears the date is going to ows in 01/Jan/2007 format, so it is > > not as simple as making a regex for it. You have to manipulate the > > date to get it in the same format. There are a few options to do thi= s. > > I am not sure where this should go. The options are that you can: > > 1. preprocess the line with the below function to make it appear like > > the apache date.. > > 2. have the function that changes the date into a timestamp detect > > YYYY-MM-DD format and use that.** > > * > > * > > <?php > > function IIS_to_apache_date($line){ > > $matches =3D explode(' ',$line); > > $date =3D array_shift($matches); > > $time =3D array_shift($matches); > > $rest_of_line =3D implode(' ',$matches); > > $date =3D str_replace('-','',$date); > > $timestamp =3D strtotime($date); > > $date =3D date ( 'd/M/Y' ,$timestamp ); > > return "[$date:$time -400] $rest_of_line"; > > } > > > > > > PS for dustin: > > The whole logtime _to_timestamp function can be way easier :) > > public function logtime_to_timestamp($logtime){ > > preg_match('/\[([^:]+):(\d+:\d+:\d+)[^\]]*\]/',$logtime,$matches); > > $date =3D str_replace('/',' ',$matches[1]); > > $time =3D $matches[2]; > > return strtotime("$date $time"); > > } > > > > > > #here is one that will work for the IIS timestamp too > > public function logtime_to_timestamp($logtime){ > > if(preg_match('/\d+-\d+-\d+.*/',$logtime)){ > > $matches =3D explode(' ',$logtime); > > $date =3D array_shift($matches); > > $date =3D str_replace('-','',$date); > > $time =3D array_shift($matches); > > } > > else{ > > preg_match('/\[([^:]+):(\d+:\d+:\d+)[^\]]*\]/',$logtime,$matches); > > $date =3D str_replace('/',' ',$matches[1]); > > $time =3D $matches[2]; > > } > > return strtotime("$date $time"); > > } > > ~ > > John Ridgway wrote: > >> > >> Michael and Dustin, > >> > >> Thank you for the very quick response. I am sorry I did not go into > >> very much detail, but I didn't want to expound on what I had done,= if > >> the solution was already at hand. > >> > >> Michael, as I began to investigate how the "Apache" logs are parsed, > >> I did find the apache_log_formats.php file and add what I thought > >> were appropriate entries for both *IIS5* and *IIS6*. > >> > >> define("CCLA_LOG_FORMAT_V5",'%{%Y-%m-%d %H:%M:%S}t %a %u %P %v %A= %p > >> %m %U %q %s \"%{$status}e\" %B - %T %H %h \"%{User-Agent}i\" > >> \"%{Cookie}i\" \"%{Referer}i\"'); > >> > >> define("CCLA_LOG_FORMAT_V6",'%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U= %q > >> %p %u %a %H \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h= %s > >> %>s \"%{$status}e\" %B \"%{}o\" %T'); > >> > >> When I did this and then ran the upload_log.php against this log > >> content: > >> > >> #Software: Microsoft Internet Information Services 6.0 > >> > >> #Version: 1.0 > >> > >> #Date: 2007-07-01 04:00:33 > >> > >> #Fields: date time s-sitename s-computername s-ip cs-method > >> cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version > >> cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus > >> sc-win32-status sc-bytes cs-bytes time-taken > >> > >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET /index.asp > >> - 80 - 76.20.156.11 HTTP/1.1 > >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.432= 2) > >> - > >>=20 > http://search.yahoo.com/search?p=3Dflordia+community+colleges&fr=3Dyfp= -t-501&toggle=3D1&cop=3Dmss&ei=3DUTF-8=20 > <http://search.yahoo.com/search?p=3Dflordia+community+colleges&fr=3Dyf= p-t-501&toggle=3D1&cop=3Dmss&ei=3DUTF-8> > >> www.cclaflorida.org 200 0 0 13102 343 343 > >> > >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET > >> /images/cclabak.gif - 80 - 76.20.156.11 HTTP/1.1 > >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.432= 2) > >> ASPSESSIONIDAAQATSBD=3DAEIGNNADIJNMJDCFEEHDLNLH > >> http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 296 345 93 > >> > >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET > >> /images/banner_master_01.gif - 80 - 76.20.156.11 HTTP/1.1 > >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.432= 2) > >> ASPSESSIONIDAAQATSBD=3DAEIGNNADIJNMJDCFEEHDLNLH > >> http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 1205 354= 78 > >> > >> I got this output: > >> > >> ***************************************************************** > >> > >> *D:\Inetpub\wwwroot\ows\scripts>php upload_log.php > >> d:\logs\Jul-Sep\ex070701.log claflorida.org debug* > >> > >> * * > >> > >> *>>> Initializing analysis engine................done.* > >> > >> *>>> Initializing analysis plugins.....done.* > >> > >> *>>> Initializing rejection plugins...done.* > >> > >> *>>> Initializing log parser... done.* > >> > >> * * > >> > >> *=3D=3D=3D=3D Debug Info =3D=3D=3D=3D* > >> > >> *Log Format:* > >> > >> * * > >> > >> *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H > >> \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s > >> \"%{$status}e\" %B \"%{}o\" %T* > >> > >> * * > >> > >> *Using regular expression:* > >> > >> * * > >> > >> */^(\S*) \[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) > >> (\S*) (\S*) (\S*) (\s*\S*\s*) (\S*) (\S*) \"(.*?)\" \"(.*?)\" > >> \"(.*?)\" (\S*) (\S*) (\S*) \"(.*?)\" (\S*) \"(.*?)\" (\S*)$/* > >> > >> * * > >> > >> *Matching fields (case-sensitive fields present in the log_format > >> string):* > >> > >> * * > >> > >> *Array* > >> > >> *(* > >> > >> * [0] =3D> Connection-Status **=DF---- **Seems to be coming from= the > >> %{%Y-%m-%d %H:%M:%S}t Apache Time column formatting... I need a way= to > >> read IIS time formatting* > >> > >> * [1] =3D> Date* > >> > >> * [2] =3D> Time* > >> > >> * [3] =3D> Process-Id* > >> > >> * [4] =3D> Server-Name* > >> > >> * [5] =3D> Local-IP* > >> > >> * [6] =3D> Request-Method* > >> > >> * [7] =3D> Request-Path* > >> > >> * [8] =3D> Query-String* > >> > >> * [9] =3D> Port* > >> > >> * [10] =3D> Remote-User* > >> > >> * [11] =3D> Remote-IP* > >> > >> * [12] =3D> Request-Protocol* > >> > >> * [13] =3D> User-Agent* > >> > >> * [14] =3D> Cookie* > >> > >> * [15] =3D> Referer* > >> > >> * [16] =3D> Remote-Host* > >> > >> * [17] =3D> Status* > >> > >> * [18] =3D> Status* > >> > >> * [19] =3D> $status* > >> > >> * [20] =3D> Bytes-Sent-X* > >> > >> * [21] =3D> Reply-Header* > >> > >> * [22] =3D> Time-Taken-S* > >> > >> *)* > >> > >> * * > >> > >> *=3D=3D=3D=3D* > >> > >> * * > >> > >> *>>> Closing analysis engine...............done* > >> > >> * * > >> > >> *D:\Inetpub\wwwroot\ows\scripts>* > >> > >> ***************************************************************** > >> > >> From what I can see, the "Apache" log variable equivalent of > >> 2007-07-01 04:00:33 is %{%Y-%m-%d %H:%M:%S}t, and when I use the > >> upload_log.php I get this for the "Date / Time" columns: > >> > >> *[0] =3D> Connection-Status* > >> > >> * [1] =3D> Date* > >> > >> * [2] =3D> Time* > >> > >> I am using ( *%{%Y-%m-%d %H:%M:%S}t* ) because this is the variable > >> formatting I use in our UNIX environment to produce the ( 2007-07-01 > >> 04:00:33 ) time formatted output. > >> > >> Our httpd.conf files have this exact variable definition for > >> formatting log output: > >> > >> *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H > >> \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s > >> \"%{$status}e\" %B \"%{}o\" %T* > >> > >> * * > >> > >> I hope I did not ramble or get confusing. I really want to be able= to > >> use this application with my logs. I see a lot of > >> functionality/expandability and it uses a SQL database for better > >> reporting capabilities. A big PLUS!!! > >> > >> John A. Ridgway > >> > >> Systems Analyst II > >> > >> College Center for Library Automation > >> > >> 850-922-6044 > >> > >> jri...@cc... > >> > >> -----Original Message----- > >> From: Michael Papile [mailto:p...@pa...] > >> Sent: Wednesday, September 19, 2007 8:47 PM > >> To: Dustin Spicuzza > >> Cc: John Ridgway; obs...@li... > >> Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics > >> > >> Hi, > >> > >> You can pretty easily make your own formatter with no programming: > >> > >> in apache_log_formats.php create a line like this: > >> > >> define("IIS_W3C",'%h %l %u %t \"%r\" %>s %b \"%{Referrer}i\" > >> \"%{User-Agent}i\"'); > >> > >> Now all of the formatting is not the IIS format (IDK what the IIS > >> formatting is), but you can make your own. > >> > >> Just look at your log output and see what fields are where. > >> > >> I made my own formatting for NGINX logs which were not apache like. > >> So look at your logfield order, and put the %h etc where it appears > >> in your log. > >> > >> Then in your config file put > >> > >> $cfg['websites']['domain.com']['log_format'] =3D IIS_W3C; > >> > >> Use the following legend (from apache_log_parser.rb) > >> > >> 319 '%' =3D> '', > >> > >> 320 'a' =3D> 'Remote-IP', > >> > >> 321 'A' =3D> 'Local-IP', > >> > >> 322 'B' =3D> 'Bytes-Sent-X', > >> > >> 323 'b' =3D> 'Bytes-Sent', > >> > >> 324 'c' =3D> 'Connection-Status', // <=3D 1.3 > >> > >> 325 'C' =3D> 'Cookie', // >=3D 2.0 > >> > >> 326 'D' =3D> 'Time-Taken-MS', > >> > >> 327 'e' =3D> 'Env-Var', > >> > >> 328 'f' =3D> 'Filename', > >> > >> 329 'h' =3D> 'Remote-Host', > >> > >> 330 'H' =3D> 'Request-Protocol', > >> > >> 331 'i' =3D> 'Request-Header', > >> > >> 332 'I' =3D> 'Bytes-Recieved', // >=3D 2.0 > >> > >> 333 'l' =3D> 'Remote-Logname', > >> > >> 334 'm' =3D> 'Request-Method', > >> > >> 335 'n' =3D> 'Note', > >> > >> 336 'o' =3D> 'Reply-Header', > >> > >> 337 'O' =3D> 'Bytes-Sent', // >=3D 2.0 > >> > >> 338 'p' =3D> 'Port', > >> > >> 339 'P' =3D> 'Process-Id', // {format} >=3D 2.0 > >> > >> 340 'q' =3D> 'Query-String', > >> > >> 341 'r' =3D> 'Request', > >> > >> 342 's' =3D> 'Status', > >> > >> 343 't' =3D> 'Time', > >> > >> 344 'T' =3D> 'Time-Taken-S', > >> > >> 345 'u' =3D> 'Remote-User', > >> > >> 346 'U' =3D> 'Request-Path', > >> > >> 347 'v' =3D> 'Server-Name', > >> > >> 348 'V' =3D> 'Server-Name-X', > >> > >> 349 'X' =3D> 'Connection-Status', // >=3D 2.0 > >> > >> 350 ); > >> > >> Dustin Spicuzza wrote: > >> > >> > I'm not familiar with IIS formatted logfiles, so I'm not sure wha= ts > >> > >> > required to make that work. The log parser uses apache CustomLog > >> > >> > formatting to parse the logfile, so its conceivable you could use > >> those > >> > >> > directives to match the format of the logs. > >> > >> > > >> > >> > The apache_log_parser.php does all the work, and is called mostly= in > >> > >> > include/analysis.inc.php .. the parse() function returns an array, > >> which > >> > >> > is then fed into the database. I'm not sure what (if any)=20 > modification > >> > >> > would be needed for this to work with the IIS logs. Maybe you cou= ld > >> send > >> > >> > a few lines of one? > >> > >> > > >> > >> > Hope that helps! > >> > >> > > >> > >> > Dustin > >> > >> > > >> > >> > John Ridgway wrote: > >> > >> > > >> > >> >> Hello support, > >> > >> >> > >> > >> >> I have recently found your PHP application and decided to give > >> > >> >> it a try. I know this is still a very new beta project, and I= am > >> > >> >> looking forward to see it work, but I am currently using W3C=20 > Extended > >> > >> >> (IIS) formatted logfiles and I realized that your app does not= know > >> > >> >> how to parse these. > >> > >> >> > >> > >> >> I began to try to figure out what to change within the files > >> > >> >> (apache_log_formats.php and apache_log_parser.php) to get IIS > >> > >> >> formatted files to work. Not an easy task.... > >> > >> >> > >> > >> >> So now I write wondering if there is already something > >> > >> >> prepared that will take care of this log formatting syntax? > >> > >> >> > >> > >> >> Here's hoping!! > >> > >> >> > >> > >> >> Thanks! > >> > >> >> > >> > >> >> John A. Ridgway > >> > >> >> College Center for Library Automation > >> > >> >> Systems Analyst II > >> > >> >> jri...@cc... > >> > >> >> > >> > >> >> > >>=20 > ----------------------------------------------------------------------= -- > >> > >> >> > >> > >> >> > >>=20 > ----------------------------------------------------------------------= --- > >> > >> > >> >> This SF.net email is sponsored by: Microsoft > >> > >> >> Defy all challenges. Microsoft(R) Visual Studio 2005. > >> > >> >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> > >> >> > >>=20 > ----------------------------------------------------------------------= -- > >> > >> >> > >> > >> >> _______________________________________________ > >> > >> >> Obsessive-compulsive mailing list > >> > >> >> Obs...@li... > >> > >> >> https://lists.sourceforge.net/lists/listinfo/obsessive-compulsive > >> > >> >> > >> > >> >> > >> > >> > > >> > >> > > >> > >> > > >> > > > > > -- > Innovation is just a problem away. > |
From: Michael P. <p...@pa...> - 2007-09-21 19:20:20
|
Dustin, here is what I sent to john about the same time that you said the same thing :) Hello, I do not have OWS running here, but I see the problem in the regex. in the user agent, referrer and the cookie, remove the \" 's . You log does not have these things in quotes. so it should be %t %P %v %A %m %U %q %p %u %a %H %{User-Agent}i %{Cookie}i %{Referrer}i %h %s %>s %N %B %N %T note "referrer" is spelt wrong in the previous string (I used to do that all the time too) Here is a little test I wrote that works. <?php $regex = ' /^\[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\s*\S*\s*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*)$/'; $string = "[01/Jul/2007:04:00:33 -400] W3SVC94731577 COBRA 192.168.2.105 GET /images/banner_master_01.gif - 80 - 76.20.156.11 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 1205 354 78"; preg_match($regex,$string,$matches); print var_dump($matches); ~ So make sure your regex in the debug message matches that. If that is so, you will not get bad lines. Micahel P. John Ridgway wrote: > > Hey guys, > > I've attempted to make the changes that have been bounced > around and the output of that testing follows below: > > > > D:\Inetpub\wwwroot\ows\scripts>php upload_log.php > d:\logs\Jul-Sep\ex070701.log cclaflorida.org debug > > > >>> Initializing analysis engine................done. > >>> Initializing analysis plugins.....done. > >>> Initializing rejection plugins...done. > >>> Initializing log parser... done. > > ==== Debug Info ==== > Log Format: > > %t %P %v %A %m %U %q %p %u %a %H \"%{User-Agent}i\" \"%{Cookie}i\" > \"%{Referer}i\" %h %s %>s %N %B %N %T -- %N = added as 'No-Data' in > apache_log_parser.php > > Using regular expression: > > /^\[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) > (\S*) (\s*\S*\s*) (\S*) (\S*) \"(.*?)\" \"(.*?)\" \"(.*?)\" (\S*) > (\S*) (\S*) (\S*) (\S*) (\S*) (\S*)$/ > > Matching fields (case-sensitive fields present in the log_format string): > > Array > ( > [0] => Date > [1] => Time > [2] => Process-Id > [3] => Server-Name > [4] => Local-IP > [5] => Request-Method > [6] => Request-Path > [7] => Query-String > [8] => Port > [9] => Remote-User > [10] => Remote-IP > [11] => Request-Protocol > [12] => User-Agent > [13] => Cookie > [14] => Referer > [15] => Remote-Host > [16] => Status > [17] => Status > [18] => No-Data > [19] => Bytes-Sent-X > [20] => No-Data > [21] => Time-Taken-S > ) > > ==== > > >>> Closing analysis engine...............done > > D:\Inetpub\wwwroot\ows\scripts>php upload_log.php > d:\logs\Jul-Sep\ex070701.log cclaflorida.org > > >>> Initializing analysis engine................done. > >>> Initializing analysis plugins.....done. > >>> Initializing rejection plugins...done. > >>> Initializing log parser... done. > >>> 0 existing lines for cclaflorida_org... > No previous data found. Uploading from beginning... > > >>> Now parsing given logfile... > > >>> Starting processing stage 1: > >>> Running pre-analysis.....done. > *Bad line found on 1*: *[01/Jul/2007:04:00:33 -400]* W3SVC94731577 > COBRA 192.168.2.105 GET /index.asp - *<--- I got the Date/Time portion > of the log to convert to the Apache syntax using Michael's code > suggestion. > *80 - 76.20.156.11 HTTP/1.1 > Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) > - h *<--- Still sees logfiles as "Bad Line".... > *ttp://search.yahoo.com/search?p=flordia+community+colleges&fr=yfp-t-501&toggle=1&cop=mss&ei=UTF-8 > ww > w.cclaflorida.org 200 0 0 13102 343 343 > > *Bad line found on 2*: [01/Jul/2007:04:00:33 -400] W3SVC94731577 COBRA > 192.168.2.105 GET /images/cclab > ak.gif - 80 - 76.20.156.11 HTTP/1.1 > Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1. > 4322) ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH > http://www.cclaflorida.org/ www.cclaflorida.org > 200 0 0 296 345 93 > > *Bad line found on 3*: [01/Jul/2007:04:00:33 -400] W3SVC94731577 COBRA > 192.168.2.105 GET /images/banne > r_master_01.gif - 80 - 76.20.156.11 HTTP/1.1 > Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET > +CLR+1.1.4322) ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH > http://www.cclaflorida.org/ www.cclaflo > rida.org 200 0 0 1205 354 78 > > 0 good lines > 3 bad lines > > Cache stats: 0h 0m (0) > 0 SQL queries/inserts > >>> Running post-analysis.....done. > > > >>> Ended at Fri Sep 21 14:23:15 EDT 2007 > > >>> Closing analysis engine...............done > > D:\Inetpub\wwwroot\ows\scripts> > > > Any suggestions??? Thanks! > > > > > John A. Ridgway > College Center for Library Automation > Systems Analyst II > jri...@cc... > > -----Original Message----- > From: Dustin Spicuzza [mailto:du...@vi...] > Sent: Thursday, September 20, 2007 12:35 AM > To: Michael Papile > Cc: John Ridgway; obs...@li... > Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics > > Oh I missed the second part of your comment. I'm almost positive I don't > use that function anyways. But, I didn't write that class either.... > > Dustin > > Michael Papile wrote: > > It appears the date is going to ows in 01/Jan/2007 format, so it is > > not as simple as making a regex for it. You have to manipulate the > > date to get it in the same format. There are a few options to do this. > > I am not sure where this should go. The options are that you can: > > 1. preprocess the line with the below function to make it appear like > > the apache date.. > > 2. have the function that changes the date into a timestamp detect > > YYYY-MM-DD format and use that.** > > * > > * > > <?php > > function IIS_to_apache_date($line){ > > $matches = explode(' ',$line); > > $date = array_shift($matches); > > $time = array_shift($matches); > > $rest_of_line = implode(' ',$matches); > > $date = str_replace('-','',$date); > > $timestamp = strtotime($date); > > $date = date ( 'd/M/Y' ,$timestamp ); > > return "[$date:$time -400] $rest_of_line"; > > } > > > > > > PS for dustin: > > The whole logtime _to_timestamp function can be way easier :) > > public function logtime_to_timestamp($logtime){ > > preg_match('/\[([^:]+):(\d+:\d+:\d+)[^\]]*\]/',$logtime,$matches); > > $date = str_replace('/',' ',$matches[1]); > > $time = $matches[2]; > > return strtotime("$date $time"); > > } > > > > > > #here is one that will work for the IIS timestamp too > > public function logtime_to_timestamp($logtime){ > > if(preg_match('/\d+-\d+-\d+.*/',$logtime)){ > > $matches = explode(' ',$logtime); > > $date = array_shift($matches); > > $date = str_replace('-','',$date); > > $time = array_shift($matches); > > } > > else{ > > preg_match('/\[([^:]+):(\d+:\d+:\d+)[^\]]*\]/',$logtime,$matches); > > $date = str_replace('/',' ',$matches[1]); > > $time = $matches[2]; > > } > > return strtotime("$date $time"); > > } > > ~ > > John Ridgway wrote: > >> > >> Michael and Dustin, > >> > >> Thank you for the very quick response. I am sorry I did not go into > >> very much detail, but I didn't want to expound on what I had done, if > >> the solution was already at hand. > >> > >> Michael, as I began to investigate how the "Apache" logs are parsed, > >> I did find the apache_log_formats.php file and add what I thought > >> were appropriate entries for both *IIS5* and *IIS6*. > >> > >> define("CCLA_LOG_FORMAT_V5",'%{%Y-%m-%d %H:%M:%S}t %a %u %P %v %A %p > >> %m %U %q %s \"%{$status}e\" %B - %T %H %h \"%{User-Agent}i\" > >> \"%{Cookie}i\" \"%{Referer}i\"'); > >> > >> define("CCLA_LOG_FORMAT_V6",'%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q > >> %p %u %a %H \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s > >> %>s \"%{$status}e\" %B \"%{}o\" %T'); > >> > >> When I did this and then ran the upload_log.php against this log > >> content: > >> > >> #Software: Microsoft Internet Information Services 6.0 > >> > >> #Version: 1.0 > >> > >> #Date: 2007-07-01 04:00:33 > >> > >> #Fields: date time s-sitename s-computername s-ip cs-method > >> cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version > >> cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus > >> sc-win32-status sc-bytes cs-bytes time-taken > >> > >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET /index.asp > >> - 80 - 76.20.156.11 HTTP/1.1 > >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) > >> - > >> > http://search.yahoo.com/search?p=flordia+community+colleges&fr=yfp-t-501&toggle=1&cop=mss&ei=UTF-8 > <http://search.yahoo.com/search?p=flordia+community+colleges&fr=yfp-t-501&toggle=1&cop=mss&ei=UTF-8> > >> www.cclaflorida.org 200 0 0 13102 343 343 > >> > >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET > >> /images/cclabak.gif - 80 - 76.20.156.11 HTTP/1.1 > >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) > >> ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH > >> http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 296 345 93 > >> > >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET > >> /images/banner_master_01.gif - 80 - 76.20.156.11 HTTP/1.1 > >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) > >> ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH > >> http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 1205 354 78 > >> > >> I got this output: > >> > >> ***************************************************************** > >> > >> *D:\Inetpub\wwwroot\ows\scripts>php upload_log.php > >> d:\logs\Jul-Sep\ex070701.log claflorida.org debug* > >> > >> * * > >> > >> *>>> Initializing analysis engine................done.* > >> > >> *>>> Initializing analysis plugins.....done.* > >> > >> *>>> Initializing rejection plugins...done.* > >> > >> *>>> Initializing log parser... done.* > >> > >> * * > >> > >> *==== Debug Info ====* > >> > >> *Log Format:* > >> > >> * * > >> > >> *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H > >> \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s > >> \"%{$status}e\" %B \"%{}o\" %T* > >> > >> * * > >> > >> *Using regular expression:* > >> > >> * * > >> > >> */^(\S*) \[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) > >> (\S*) (\S*) (\S*) (\s*\S*\s*) (\S*) (\S*) \"(.*?)\" \"(.*?)\" > >> \"(.*?)\" (\S*) (\S*) (\S*) \"(.*?)\" (\S*) \"(.*?)\" (\S*)$/* > >> > >> * * > >> > >> *Matching fields (case-sensitive fields present in the log_format > >> string):* > >> > >> * * > >> > >> *Array* > >> > >> *(* > >> > >> * [0] => Connection-Status **ß---- **Seems to be coming from the > >> %{%Y-%m-%d %H:%M:%S}t Apache Time column formatting… I need a way to > >> read IIS time formatting* > >> > >> * [1] => Date* > >> > >> * [2] => Time* > >> > >> * [3] => Process-Id* > >> > >> * [4] => Server-Name* > >> > >> * [5] => Local-IP* > >> > >> * [6] => Request-Method* > >> > >> * [7] => Request-Path* > >> > >> * [8] => Query-String* > >> > >> * [9] => Port* > >> > >> * [10] => Remote-User* > >> > >> * [11] => Remote-IP* > >> > >> * [12] => Request-Protocol* > >> > >> * [13] => User-Agent* > >> > >> * [14] => Cookie* > >> > >> * [15] => Referer* > >> > >> * [16] => Remote-Host* > >> > >> * [17] => Status* > >> > >> * [18] => Status* > >> > >> * [19] => $status* > >> > >> * [20] => Bytes-Sent-X* > >> > >> * [21] => Reply-Header* > >> > >> * [22] => Time-Taken-S* > >> > >> *)* > >> > >> * * > >> > >> *====* > >> > >> * * > >> > >> *>>> Closing analysis engine...............done* > >> > >> * * > >> > >> *D:\Inetpub\wwwroot\ows\scripts>* > >> > >> ***************************************************************** > >> > >> From what I can see, the "Apache" log variable equivalent of > >> 2007-07-01 04:00:33 is %{%Y-%m-%d %H:%M:%S}t, and when I use the > >> upload_log.php I get this for the “Date / Time” columns: > >> > >> *[0] => Connection-Status* > >> > >> * [1] => Date* > >> > >> * [2] => Time* > >> > >> I am using ( *%{%Y-%m-%d %H:%M:%S}t* ) because this is the variable > >> formatting I use in our UNIX environment to produce the ( 2007-07-01 > >> 04:00:33 ) time formatted output. > >> > >> Our httpd.conf files have this exact variable definition for > >> formatting log output: > >> > >> *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H > >> \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s > >> \"%{$status}e\" %B \"%{}o\" %T* > >> > >> * * > >> > >> I hope I did not ramble or get confusing. I really want to be able to > >> use this application with my logs. I see a lot of > >> functionality/expandability and it uses a SQL database for better > >> reporting capabilities. A big PLUS!!! > >> > >> John A. Ridgway > >> > >> Systems Analyst II > >> > >> College Center for Library Automation > >> > >> 850-922-6044 > >> > >> jri...@cc... > >> > >> -----Original Message----- > >> From: Michael Papile [mailto:p...@pa...] > >> Sent: Wednesday, September 19, 2007 8:47 PM > >> To: Dustin Spicuzza > >> Cc: John Ridgway; obs...@li... > >> Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics > >> > >> Hi, > >> > >> You can pretty easily make your own formatter with no programming: > >> > >> in apache_log_formats.php create a line like this: > >> > >> define("IIS_W3C",'%h %l %u %t \"%r\" %>s %b \"%{Referrer}i\" > >> \"%{User-Agent}i\"'); > >> > >> Now all of the formatting is not the IIS format (IDK what the IIS > >> formatting is), but you can make your own. > >> > >> Just look at your log output and see what fields are where. > >> > >> I made my own formatting for NGINX logs which were not apache like. > >> So look at your logfield order, and put the %h etc where it appears > >> in your log. > >> > >> Then in your config file put > >> > >> $cfg['websites']['domain.com']['log_format'] = IIS_W3C; > >> > >> Use the following legend (from apache_log_parser.rb) > >> > >> 319 '%' => '', > >> > >> 320 'a' => 'Remote-IP', > >> > >> 321 'A' => 'Local-IP', > >> > >> 322 'B' => 'Bytes-Sent-X', > >> > >> 323 'b' => 'Bytes-Sent', > >> > >> 324 'c' => 'Connection-Status', // <= 1.3 > >> > >> 325 'C' => 'Cookie', // >= 2.0 > >> > >> 326 'D' => 'Time-Taken-MS', > >> > >> 327 'e' => 'Env-Var', > >> > >> 328 'f' => 'Filename', > >> > >> 329 'h' => 'Remote-Host', > >> > >> 330 'H' => 'Request-Protocol', > >> > >> 331 'i' => 'Request-Header', > >> > >> 332 'I' => 'Bytes-Recieved', // >= 2.0 > >> > >> 333 'l' => 'Remote-Logname', > >> > >> 334 'm' => 'Request-Method', > >> > >> 335 'n' => 'Note', > >> > >> 336 'o' => 'Reply-Header', > >> > >> 337 'O' => 'Bytes-Sent', // >= 2.0 > >> > >> 338 'p' => 'Port', > >> > >> 339 'P' => 'Process-Id', // {format} >= 2.0 > >> > >> 340 'q' => 'Query-String', > >> > >> 341 'r' => 'Request', > >> > >> 342 's' => 'Status', > >> > >> 343 't' => 'Time', > >> > >> 344 'T' => 'Time-Taken-S', > >> > >> 345 'u' => 'Remote-User', > >> > >> 346 'U' => 'Request-Path', > >> > >> 347 'v' => 'Server-Name', > >> > >> 348 'V' => 'Server-Name-X', > >> > >> 349 'X' => 'Connection-Status', // >= 2.0 > >> > >> 350 ); > >> > >> Dustin Spicuzza wrote: > >> > >> > I'm not familiar with IIS formatted logfiles, so I'm not sure whats > >> > >> > required to make that work. The log parser uses apache CustomLog > >> > >> > formatting to parse the logfile, so its conceivable you could use > >> those > >> > >> > directives to match the format of the logs. > >> > >> > > >> > >> > The apache_log_parser.php does all the work, and is called mostly in > >> > >> > include/analysis.inc.php .. the parse() function returns an array, > >> which > >> > >> > is then fed into the database. I'm not sure what (if any) > modification > >> > >> > would be needed for this to work with the IIS logs. Maybe you could > >> send > >> > >> > a few lines of one? > >> > >> > > >> > >> > Hope that helps! > >> > >> > > >> > >> > Dustin > >> > >> > > >> > >> > John Ridgway wrote: > >> > >> > > >> > >> >> Hello support, > >> > >> >> > >> > >> >> I have recently found your PHP application and decided to give > >> > >> >> it a try. I know this is still a very new beta project, and I am > >> > >> >> looking forward to see it work, but I am currently using W3C > Extended > >> > >> >> (IIS) formatted logfiles and I realized that your app does not know > >> > >> >> how to parse these. > >> > >> >> > >> > >> >> I began to try to figure out what to change within the files > >> > >> >> (apache_log_formats.php and apache_log_parser.php) to get IIS > >> > >> >> formatted files to work. Not an easy task.... > >> > >> >> > >> > >> >> So now I write wondering if there is already something > >> > >> >> prepared that will take care of this log formatting syntax? > >> > >> >> > >> > >> >> Here's hoping!! > >> > >> >> > >> > >> >> Thanks! > >> > >> >> > >> > >> >> John A. Ridgway > >> > >> >> College Center for Library Automation > >> > >> >> Systems Analyst II > >> > >> >> jri...@cc... > >> > >> >> > >> > >> >> > >> > ------------------------------------------------------------------------ > >> > >> >> > >> > >> >> > >> > ------------------------------------------------------------------------- > >> > >> > >> >> This SF.net email is sponsored by: Microsoft > >> > >> >> Defy all challenges. Microsoft(R) Visual Studio 2005. > >> > >> >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> > >> >> > >> > ------------------------------------------------------------------------ > >> > >> >> > >> > >> >> _______________________________________________ > >> > >> >> Obsessive-compulsive mailing list > >> > >> >> Obs...@li... > >> > >> >> https://lists.sourceforge.net/lists/listinfo/obsessive-compulsive > >> > >> >> > >> > >> >> > >> > >> > > >> > >> > > >> > >> > > >> > > > > > -- > Innovation is just a problem away. > |
From: Dustin S. <du...@vi...> - 2007-09-21 19:13:43
|
Hey, Sorry, apparently me and Michael had a seperate conversation that didn't get CC'ed to you... the extra code segment isn't needed because YYYY-MM-DD is a valid string, and strtotime() will automatically convert it. :) I just noticed that the IIS logfile doesn't have quotes around the referrer/agent... and does urlencoding instead. So you'll probably need to run urldecode on all of those fields.. we should probably have it set as a flag somewhere, instead of doing it on all strings. Try removing the \" from the referrer/user agent/etc fields and see what happens? As noted above, there are no quotes in the IIS log line apparently. I'll see if I can play with it this weekend.. Something I did while debugging this stuff originally is I would run preg_match using the given regular expression and then print_r the array that it returns, to see how much of the string it actually matched. Just a really simple file. Dustin John Ridgway wrote: > > Hey guys, > > I've attempted to make the changes that have been bounced around and > the output of that testing follows below: > > > > D:\Inetpub\wwwroot\ows\scripts>php upload_log.php > d:\logs\Jul-Sep\ex070701.log cclaflorida.org debug > > > >>> Initializing analysis engine................done. > >>> Initializing analysis plugins.....done. > >>> Initializing rejection plugins...done. > >>> Initializing log parser... done. > > ==== Debug Info ==== > Log Format: > > %t %P %v %A %m %U %q %p %u %a %H \"%{User-Agent}i\" \"%{Cookie}i\" > \"%{Referer}i\" %h %s %>s %N %B %N %T -- %N = added as 'No-Data' in > apache_log_parser.php > > Using regular expression: > > /^\[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) > (\S*) (\s*\S*\s*) (\S*) (\S*) \"(.*?)\" \"(.*?)\" \"(.*?)\" (\S*) > (\S*) (\S*) (\S*) (\S*) (\S*) (\S*)$/ > > Matching fields (case-sensitive fields present in the log_format string): > > Array > ( > [0] => Date > [1] => Time > [2] => Process-Id > [3] => Server-Name > [4] => Local-IP > [5] => Request-Method > [6] => Request-Path > [7] => Query-String > [8] => Port > [9] => Remote-User > [10] => Remote-IP > [11] => Request-Protocol > [12] => User-Agent > [13] => Cookie > [14] => Referer > [15] => Remote-Host > [16] => Status > [17] => Status > [18] => No-Data > [19] => Bytes-Sent-X > [20] => No-Data > [21] => Time-Taken-S > ) > > ==== > > >>> Closing analysis engine...............done > > D:\Inetpub\wwwroot\ows\scripts>php upload_log.php > d:\logs\Jul-Sep\ex070701.log cclaflorida.org > > >>> Initializing analysis engine................done. > >>> Initializing analysis plugins.....done. > >>> Initializing rejection plugins...done. > >>> Initializing log parser... done. > >>> 0 existing lines for cclaflorida_org... > No previous data found. Uploading from beginning... > > >>> Now parsing given logfile... > > >>> Starting processing stage 1: > >>> Running pre-analysis.....done. > *Bad line found on 1*: *[01/Jul/2007:04:00:33 -400]* W3SVC94731577 > COBRA 192.168.2.105 GET /index.asp - *<--- I got the Date/Time portion > of the log to convert to the Apache syntax using Michael's code > suggestion. > *80 - 76.20.156.11 HTTP/1.1 > Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) > - h *<--- Still sees logfiles as "Bad Line".... > *ttp://search.yahoo.com/search?p=flordia+community+colleges&fr=yfp-t-501&toggle=1&cop=mss&ei=UTF-8 > ww > w.cclaflorida.org 200 0 0 13102 343 343 > > *Bad line found on 2*: [01/Jul/2007:04:00:33 -400] W3SVC94731577 COBRA > 192.168.2.105 GET /images/cclab > ak.gif - 80 - 76.20.156.11 HTTP/1.1 > Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1. > 4322) ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH > http://www.cclaflorida.org/ www.cclaflorida.org > 200 0 0 296 345 93 > > *Bad line found on 3*: [01/Jul/2007:04:00:33 -400] W3SVC94731577 COBRA > 192.168.2.105 GET /images/banne > r_master_01.gif - 80 - 76.20.156.11 HTTP/1.1 > Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET > +CLR+1.1.4322) ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH > http://www.cclaflorida.org/ www.cclaflo > rida.org 200 0 0 1205 354 78 > > 0 good lines > 3 bad lines > > Cache stats: 0h 0m (0) > 0 SQL queries/inserts > >>> Running post-analysis.....done. > > > >>> Ended at Fri Sep 21 14:23:15 EDT 2007 > > >>> Closing analysis engine...............done > > D:\Inetpub\wwwroot\ows\scripts> > > Any suggestions??? Thanks! > > > > > John A. Ridgway > College Center for Library Automation > Systems Analyst II > jri...@cc... > > -----Original Message----- > From: Dustin Spicuzza [mailto:du...@vi...] > Sent: Thursday, September 20, 2007 12:35 AM > To: Michael Papile > Cc: John Ridgway; obs...@li... > Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics > > Oh I missed the second part of your comment. I'm almost positive I don't > use that function anyways. But, I didn't write that class either.... > > Dustin > > Michael Papile wrote: > > It appears the date is going to ows in 01/Jan/2007 format, so it is > > not as simple as making a regex for it. You have to manipulate the > > date to get it in the same format. There are a few options to do this. > > I am not sure where this should go. The options are that you can: > > 1. preprocess the line with the below function to make it appear like > > the apache date.. > > 2. have the function that changes the date into a timestamp detect > > YYYY-MM-DD format and use that.** > > * > > * > > <?php > > function IIS_to_apache_date($line){ > > $matches = explode(' ',$line); > > $date = array_shift($matches); > > $time = array_shift($matches); > > $rest_of_line = implode(' ',$matches); > > $date = str_replace('-','',$date); > > $timestamp = strtotime($date); > > $date = date ( 'd/M/Y' ,$timestamp ); > > return "[$date:$time -400] $rest_of_line"; > > } > > > > > > PS for dustin: > > The whole logtime _to_timestamp function can be way easier :) > > public function logtime_to_timestamp($logtime){ > > preg_match('/\[([^:]+):(\d+:\d+:\d+)[^\]]*\]/',$logtime,$matches); > > $date = str_replace('/',' ',$matches[1]); > > $time = $matches[2]; > > return strtotime("$date $time"); > > } > > > > > > #here is one that will work for the IIS timestamp too > > public function logtime_to_timestamp($logtime){ > > if(preg_match('/\d+-\d+-\d+.*/',$logtime)){ > > $matches = explode(' ',$logtime); > > $date = array_shift($matches); > > $date = str_replace('-','',$date); > > $time = array_shift($matches); > > } > > else{ > > preg_match('/\[([^:]+):(\d+:\d+:\d+)[^\]]*\]/',$logtime,$matches); > > $date = str_replace('/',' ',$matches[1]); > > $time = $matches[2]; > > } > > return strtotime("$date $time"); > > } > > ~ > > John Ridgway wrote: > >> > >> Michael and Dustin, > >> > >> Thank you for the very quick response. I am sorry I did not go into > >> very much detail, but I didn't want to expound on what I had done, if > >> the solution was already at hand. > >> > >> Michael, as I began to investigate how the "Apache" logs are parsed, > >> I did find the apache_log_formats.php file and add what I thought > >> were appropriate entries for both *IIS5* and *IIS6*. > >> > >> define("CCLA_LOG_FORMAT_V5",'%{%Y-%m-%d %H:%M:%S}t %a %u %P %v %A %p > >> %m %U %q %s \"%{$status}e\" %B - %T %H %h \"%{User-Agent}i\" > >> \"%{Cookie}i\" \"%{Referer}i\"'); > >> > >> define("CCLA_LOG_FORMAT_V6",'%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q > >> %p %u %a %H \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s > >> %>s \"%{$status}e\" %B \"%{}o\" %T'); > >> > >> When I did this and then ran the upload_log.php against this log > >> content: > >> > >> #Software: Microsoft Internet Information Services 6.0 > >> > >> #Version: 1.0 > >> > >> #Date: 2007-07-01 04:00:33 > >> > >> #Fields: date time s-sitename s-computername s-ip cs-method > >> cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version > >> cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus > >> sc-win32-status sc-bytes cs-bytes time-taken > >> > >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET /index.asp > >> - 80 - 76.20.156.11 HTTP/1.1 > >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) > >> - > >> > http://search.yahoo.com/search?p=flordia+community+colleges&fr=yfp-t-501&toggle=1&cop=mss&ei=UTF-8 > <http://search.yahoo.com/search?p=flordia+community+colleges&fr=yfp-t-501&toggle=1&cop=mss&ei=UTF-8> > >> www.cclaflorida.org 200 0 0 13102 343 343 > >> > >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET > >> /images/cclabak.gif - 80 - 76.20.156.11 HTTP/1.1 > >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) > >> ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH > >> http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 296 345 93 > >> > >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET > >> /images/banner_master_01.gif - 80 - 76.20.156.11 HTTP/1.1 > >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) > >> ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH > >> http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 1205 354 78 > >> > >> I got this output: > >> > >> ***************************************************************** > >> > >> *D:\Inetpub\wwwroot\ows\scripts>php upload_log.php > >> d:\logs\Jul-Sep\ex070701.log claflorida.org debug* > >> > >> * * > >> > >> *>>> Initializing analysis engine................done.* > >> > >> *>>> Initializing analysis plugins.....done.* > >> > >> *>>> Initializing rejection plugins...done.* > >> > >> *>>> Initializing log parser... done.* > >> > >> * * > >> > >> *==== Debug Info ====* > >> > >> *Log Format:* > >> > >> * * > >> > >> *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H > >> \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s > >> \"%{$status}e\" %B \"%{}o\" %T* > >> > >> * * > >> > >> *Using regular expression:* > >> > >> * * > >> > >> */^(\S*) \[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) > >> (\S*) (\S*) (\S*) (\s*\S*\s*) (\S*) (\S*) \"(.*?)\" \"(.*?)\" > >> \"(.*?)\" (\S*) (\S*) (\S*) \"(.*?)\" (\S*) \"(.*?)\" (\S*)$/* > >> > >> * * > >> > >> *Matching fields (case-sensitive fields present in the log_format > >> string):* > >> > >> * * > >> > >> *Array* > >> > >> *(* > >> > >> * [0] => Connection-Status **ß---- **Seems to be coming from the > >> %{%Y-%m-%d %H:%M:%S}t Apache Time column formatting… I need a way to > >> read IIS time formatting* > >> > >> * [1] => Date* > >> > >> * [2] => Time* > >> > >> * [3] => Process-Id* > >> > >> * [4] => Server-Name* > >> > >> * [5] => Local-IP* > >> > >> * [6] => Request-Method* > >> > >> * [7] => Request-Path* > >> > >> * [8] => Query-String* > >> > >> * [9] => Port* > >> > >> * [10] => Remote-User* > >> > >> * [11] => Remote-IP* > >> > >> * [12] => Request-Protocol* > >> > >> * [13] => User-Agent* > >> > >> * [14] => Cookie* > >> > >> * [15] => Referer* > >> > >> * [16] => Remote-Host* > >> > >> * [17] => Status* > >> > >> * [18] => Status* > >> > >> * [19] => $status* > >> > >> * [20] => Bytes-Sent-X* > >> > >> * [21] => Reply-Header* > >> > >> * [22] => Time-Taken-S* > >> > >> *)* > >> > >> * * > >> > >> *====* > >> > >> * * > >> > >> *>>> Closing analysis engine...............done* > >> > >> * * > >> > >> *D:\Inetpub\wwwroot\ows\scripts>* > >> > >> ***************************************************************** > >> > >> From what I can see, the "Apache" log variable equivalent of > >> 2007-07-01 04:00:33 is %{%Y-%m-%d %H:%M:%S}t, and when I use the > >> upload_log.php I get this for the “Date / Time” columns: > >> > >> *[0] => Connection-Status* > >> > >> * [1] => Date* > >> > >> * [2] => Time* > >> > >> I am using ( *%{%Y-%m-%d %H:%M:%S}t* ) because this is the variable > >> formatting I use in our UNIX environment to produce the ( 2007-07-01 > >> 04:00:33 ) time formatted output. > >> > >> Our httpd.conf files have this exact variable definition for > >> formatting log output: > >> > >> *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H > >> \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s > >> \"%{$status}e\" %B \"%{}o\" %T* > >> > >> * * > >> > >> I hope I did not ramble or get confusing. I really want to be able to > >> use this application with my logs. I see a lot of > >> functionality/expandability and it uses a SQL database for better > >> reporting capabilities. A big PLUS!!! > >> > >> John A. Ridgway > >> > >> Systems Analyst II > >> > >> College Center for Library Automation > >> > >> 850-922-6044 > >> > >> jri...@cc... > >> > >> -----Original Message----- > >> From: Michael Papile [mailto:p...@pa...] > >> Sent: Wednesday, September 19, 2007 8:47 PM > >> To: Dustin Spicuzza > >> Cc: John Ridgway; obs...@li... > >> Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics > >> > >> Hi, > >> > >> You can pretty easily make your own formatter with no programming: > >> > >> in apache_log_formats.php create a line like this: > >> > >> define("IIS_W3C",'%h %l %u %t \"%r\" %>s %b \"%{Referrer}i\" > >> \"%{User-Agent}i\"'); > >> > >> Now all of the formatting is not the IIS format (IDK what the IIS > >> formatting is), but you can make your own. > >> > >> Just look at your log output and see what fields are where. > >> > >> I made my own formatting for NGINX logs which were not apache like. > >> So look at your logfield order, and put the %h etc where it appears > >> in your log. > >> > >> Then in your config file put > >> > >> $cfg['websites']['domain.com']['log_format'] = IIS_W3C; > >> > >> Use the following legend (from apache_log_parser.rb) > >> > >> 319 '%' => '', > >> > >> 320 'a' => 'Remote-IP', > >> > >> 321 'A' => 'Local-IP', > >> > >> 322 'B' => 'Bytes-Sent-X', > >> > >> 323 'b' => 'Bytes-Sent', > >> > >> 324 'c' => 'Connection-Status', // <= 1.3 > >> > >> 325 'C' => 'Cookie', // >= 2.0 > >> > >> 326 'D' => 'Time-Taken-MS', > >> > >> 327 'e' => 'Env-Var', > >> > >> 328 'f' => 'Filename', > >> > >> 329 'h' => 'Remote-Host', > >> > >> 330 'H' => 'Request-Protocol', > >> > >> 331 'i' => 'Request-Header', > >> > >> 332 'I' => 'Bytes-Recieved', // >= 2.0 > >> > >> 333 'l' => 'Remote-Logname', > >> > >> 334 'm' => 'Request-Method', > >> > >> 335 'n' => 'Note', > >> > >> 336 'o' => 'Reply-Header', > >> > >> 337 'O' => 'Bytes-Sent', // >= 2.0 > >> > >> 338 'p' => 'Port', > >> > >> 339 'P' => 'Process-Id', // {format} >= 2.0 > >> > >> 340 'q' => 'Query-String', > >> > >> 341 'r' => 'Request', > >> > >> 342 's' => 'Status', > >> > >> 343 't' => 'Time', > >> > >> 344 'T' => 'Time-Taken-S', > >> > >> 345 'u' => 'Remote-User', > >> > >> 346 'U' => 'Request-Path', > >> > >> 347 'v' => 'Server-Name', > >> > >> 348 'V' => 'Server-Name-X', > >> > >> 349 'X' => 'Connection-Status', // >= 2.0 > >> > >> 350 ); > >> > >> Dustin Spicuzza wrote: > >> > >> > I'm not familiar with IIS formatted logfiles, so I'm not sure whats > >> > >> > required to make that work. The log parser uses apache CustomLog > >> > >> > formatting to parse the logfile, so its conceivable you could use > >> those > >> > >> > directives to match the format of the logs. > >> > >> > > >> > >> > The apache_log_parser.php does all the work, and is called mostly in > >> > >> > include/analysis.inc.php .. the parse() function returns an array, > >> which > >> > >> > is then fed into the database. I'm not sure what (if any) > modification > >> > >> > would be needed for this to work with the IIS logs. Maybe you could > >> send > >> > >> > a few lines of one? > >> > >> > > >> > >> > Hope that helps! > >> > >> > > >> > >> > Dustin > >> > >> > > >> > >> > John Ridgway wrote: > >> > >> > > >> > >> >> Hello support, > >> > >> >> > >> > >> >> I have recently found your PHP application and decided to give > >> > >> >> it a try. I know this is still a very new beta project, and I am > >> > >> >> looking forward to see it work, but I am currently using W3C > Extended > >> > >> >> (IIS) formatted logfiles and I realized that your app does not know > >> > >> >> how to parse these. > >> > >> >> > >> > >> >> I began to try to figure out what to change within the files > >> > >> >> (apache_log_formats.php and apache_log_parser.php) to get IIS > >> > >> >> formatted files to work. Not an easy task.... > >> > >> >> > >> > >> >> So now I write wondering if there is already something > >> > >> >> prepared that will take care of this log formatting syntax? > >> > >> >> > >> > >> >> Here's hoping!! > >> > >> >> > >> > >> >> Thanks! > >> > >> >> > >> > >> >> John A. Ridgway > >> > >> >> College Center for Library Automation > >> > >> >> Systems Analyst II > >> > >> >> jri...@cc... > >> > >> >> > >> > >> >> > >> > ------------------------------------------------------------------------ > >> > >> >> > >> > >> >> > >> > ------------------------------------------------------------------------- > >> > >> > >> >> This SF.net email is sponsored by: Microsoft > >> > >> >> Defy all challenges. Microsoft(R) Visual Studio 2005. > >> > >> >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> > >> >> > >> > ------------------------------------------------------------------------ > >> > >> >> > >> > >> >> _______________________________________________ > >> > >> >> Obsessive-compulsive mailing list > >> > >> >> Obs...@li... > >> > >> >> https://lists.sourceforge.net/lists/listinfo/obsessive-compulsive > >> > >> >> > >> > >> >> > >> > >> > > >> > >> > > >> > >> > > >> > > > > > -- > Innovation is just a problem away. > -- Innovation is just a problem away |
From: John R. <JRi...@cc...> - 2007-09-21 18:30:32
|
Hey guys, I've attempted to make the changes that have been bounced around= and the output of that testing follows below: D:\Inetpub\wwwroot\ows\scripts>php upload_log.php d:\logs\Jul-Sep\ex0707= 01.log cclaflorida.org debug >>> Initializing analysis engine................done. >>> Initializing analysis plugins.....done. >>> Initializing rejection plugins...done. >>> Initializing log parser... done. =3D=3D=3D=3D Debug Info =3D=3D=3D=3D Log Format: %t %P %v %A %m %U %q %p %u %a %H \"%{User-Agent}i\" \"%{Cookie}i\" \"%{R= eferer}i\" %h %s %>s %N %B %N %T -- %N =3D added as 'No-Data' in apache= _log_parser.php Using regular expression: /^\[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) (\S*) (\S*)= (\S*) (\s*\S*\s*) (\S*) (\S*) \"(.*?)\" \"(.*?)\" \"(.*?)\" (\S*) (\S*)= (\S*) (\S*) (\S*) (\S*) (\S*)$/ Matching fields (case-sensitive fields present in the log_format string): Array ( [0] =3D> Date [1] =3D> Time [2] =3D> Process-Id [3] =3D> Server-Name [4] =3D> Local-IP [5] =3D> Request-Method [6] =3D> Request-Path [7] =3D> Query-String [8] =3D> Port [9] =3D> Remote-User [10] =3D> Remote-IP [11] =3D> Request-Protocol [12] =3D> User-Agent [13] =3D> Cookie [14] =3D> Referer [15] =3D> Remote-Host [16] =3D> Status [17] =3D> Status [18] =3D> No-Data [19] =3D> Bytes-Sent-X [20] =3D> No-Data [21] =3D> Time-Taken-S ) =3D=3D=3D=3D >>> Closing analysis engine...............done D:\Inetpub\wwwroot\ows\scripts>php upload_log.php d:\logs\Jul-Sep\ex0707= 01.log cclaflorida.org >>> Initializing analysis engine................done. >>> Initializing analysis plugins.....done. >>> Initializing rejection plugins...done. >>> Initializing log parser... done. >>> 0 existing lines for cclaflorida_org... No previous data found. Uploading from beginning... >>> Now parsing given logfile... >>> Starting processing stage 1: >>> Running pre-analysis.....done. Bad line found on 1: [01/Jul/2007:04:00:33 -400] W3SVC94731577 COBRA 192= .168.2.105 GET /index.asp - <--- I got the Date/Time portion of the log= to convert to the Apache syntax using Michael's code suggestion. 80 - 76.20.156.11 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT= +5.1;+.NET+CLR+1.1.4322) - h <--- Still sees logfiles as "Bad Line".... ttp://search.yahoo.com/search?p=3Dflordia+community+colleges&fr=3Dyfp-t-= 501&toggle=3D1&cop=3Dmss&ei=3DUTF-8 ww w.cclaflorida.org 200 0 0 13102 343 343 Bad line found on 2: [01/Jul/2007:04:00:33 -400] W3SVC94731577 COBRA 192= .168.2.105 GET /images/cclab ak.gif - 80 - 76.20.156.11 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+7.0;+W= indows+NT+5.1;+.NET+CLR+1.1. 4322) ASPSESSIONIDAAQATSBD=3DAEIGNNADIJNMJDCFEEHDLNLH http://www.cclaflo= rida.org/ www.cclaflorida.org 200 0 0 296 345 93 Bad line found on 3: [01/Jul/2007:04:00:33 -400] W3SVC94731577 COBRA 192= .168.2.105 GET /images/banne r_master_01.gif - 80 - 76.20.156.11 HTTP/1.1 Mozilla/4.0+(compatible;+MS= IE+7.0;+Windows+NT+5.1;+.NET +CLR+1.1.4322) ASPSESSIONIDAAQATSBD=3DAEIGNNADIJNMJDCFEEHDLNLH http://ww= w.cclaflorida.org/ www.cclaflo rida.org 200 0 0 1205 354 78 0 good lines 3 bad lines Cache stats: 0h 0m (0) 0 SQL queries/inserts >>> Running post-analysis.....done. >>> Ended at Fri Sep 21 14:23:15 EDT 2007 >>> Closing analysis engine...............done D:\Inetpub\wwwroot\ows\scripts> =20 Any suggestions??? Thanks! John A. Ridgway College Center for Library Automation Systems Analyst II jri...@cc... -----Original Message----- From: Dustin Spicuzza [mailto:du...@vi...] Sent: Thursday, September 20, 2007 12:35 AM To: Michael Papile Cc: John Ridgway; obs...@li... Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics Oh I missed the second part of your comment. I'm almost positive I don't use that function anyways. But, I didn't write that class either.... Dustin Michael Papile wrote: > It appears the date is going to ows in 01/Jan/2007 format, so it is > not as simple as making a regex for it. You have to manipulate the > date to get it in the same format. There are a few options to do this. > I am not sure where this should go. The options are that you can: > 1. preprocess the line with the below function to make it appear like > the apache date.. > 2. have the function that changes the date into a timestamp detect > YYYY-MM-DD format and use that.** > * > * > <?php > function IIS_to_apache_date($line){ > $matches =3D explode(' ',$line); > $date =3D array_shift($matches); > $time =3D array_shift($matches); > $rest_of_line =3D implode(' ',$matches); > $date =3D str_replace('-','',$date); > $timestamp =3D strtotime($date); > $date =3D date ( 'd/M/Y' ,$timestamp ); > return "[$date:$time -400] $rest_of_line"; > } > > > PS for dustin: > The whole logtime _to_timestamp function can be way easier :) > public function logtime_to_timestamp($logtime){ > preg_match('/\[([^:]+):(\d+:\d+:\d+)[^\]]*\]/',$logtime,$matches); > $date =3D str_replace('/',' ',$matches[1]); > $time =3D $matches[2]; > return strtotime("$date $time"); > } > > > #here is one that will work for the IIS timestamp too > public function logtime_to_timestamp($logtime){ > if(preg_match('/\d+-\d+-\d+.*/',$logtime)){ > $matches =3D explode(' ',$logtime); > $date =3D array_shift($matches); > $date =3D str_replace('-','',$date); > $time =3D array_shift($matches); > } > else{ > preg_match('/\[([^:]+):(\d+:\d+:\d+)[^\]]*\]/',$logtime,$matches); > $date =3D str_replace('/',' ',$matches[1]); > $time =3D $matches[2]; > } > return strtotime("$date $time"); > } > ~ > John Ridgway wrote: >> >> Michael and Dustin, >> >> Thank you for the very quick response. I am sorry I did not go into >> very much detail, but I didn't want to expound on what I had done,= if >> the solution was already at hand. >> >> Michael, as I began to investigate how the "Apache" logs are parsed, >> I did find the apache_log_formats.php file and add what I thought >> were appropriate entries for both *IIS5* and *IIS6*. >> >> define("CCLA_LOG_FORMAT_V5",'%{%Y-%m-%d %H:%M:%S}t %a %u %P %v %A %p >> %m %U %q %s \"%{$status}e\" %B - %T %H %h \"%{User-Agent}i\" >> \"%{Cookie}i\" \"%{Referer}i\"'); >> >> define("CCLA_LOG_FORMAT_V6",'%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q >> %p %u %a %H \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s >> %>s \"%{$status}e\" %B \"%{}o\" %T'); >> >> When I did this and then ran the upload_log.php against this log >> content: >> >> #Software: Microsoft Internet Information Services 6.0 >> >> #Version: 1.0 >> >> #Date: 2007-07-01 04:00:33 >> >> #Fields: date time s-sitename s-computername s-ip cs-method >> cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version >> cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus >> sc-win32-status sc-bytes cs-bytes time-taken >> >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET /index.asp >> - 80 - 76.20.156.11 HTTP/1.1 >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) >> - >> http://search.yahoo.com/search?p=3Dflordia+community+colleges&fr=3Dyf= p-t-501&toggle=3D1&cop=3Dmss&ei=3DUTF-8 >> www.cclaflorida.org 200 0 0 13102 343 343 >> >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET >> /images/cclabak.gif - 80 - 76.20.156.11 HTTP/1.1 >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) >> ASPSESSIONIDAAQATSBD=3DAEIGNNADIJNMJDCFEEHDLNLH >> http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 296 345 93 >> >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET >> /images/banner_master_01.gif - 80 - 76.20.156.11 HTTP/1.1 >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) >> ASPSESSIONIDAAQATSBD=3DAEIGNNADIJNMJDCFEEHDLNLH >> http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 1205 354 78 >> >> I got this output: >> >> ***************************************************************** >> >> *D:\Inetpub\wwwroot\ows\scripts>php upload_log.php >> d:\logs\Jul-Sep\ex070701.log claflorida.org debug* >> >> * * >> >> *>>> Initializing analysis engine................done.* >> >> *>>> Initializing analysis plugins.....done.* >> >> *>>> Initializing rejection plugins...done.* >> >> *>>> Initializing log parser... done.* >> >> * * >> >> *=3D=3D=3D=3D Debug Info =3D=3D=3D=3D* >> >> *Log Format:* >> >> * * >> >> *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H >> \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s >> \"%{$status}e\" %B \"%{}o\" %T* >> >> * * >> >> *Using regular expression:* >> >> * * >> >> */^(\S*) \[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) >> (\S*) (\S*) (\S*) (\s*\S*\s*) (\S*) (\S*) \"(.*?)\" \"(.*?)\" >> \"(.*?)\" (\S*) (\S*) (\S*) \"(.*?)\" (\S*) \"(.*?)\" (\S*)$/* >> >> * * >> >> *Matching fields (case-sensitive fields present in the log_format >> string):* >> >> * * >> >> *Array* >> >> *(* >> >> * [0] =3D> Connection-Status **=DF---- **Seems to be coming from the >> %{%Y-%m-%d %H:%M:%S}t Apache Time column formatting... I need a way= to >> read IIS time formatting* >> >> * [1] =3D> Date* >> >> * [2] =3D> Time* >> >> * [3] =3D> Process-Id* >> >> * [4] =3D> Server-Name* >> >> * [5] =3D> Local-IP* >> >> * [6] =3D> Request-Method* >> >> * [7] =3D> Request-Path* >> >> * [8] =3D> Query-String* >> >> * [9] =3D> Port* >> >> * [10] =3D> Remote-User* >> >> * [11] =3D> Remote-IP* >> >> * [12] =3D> Request-Protocol* >> >> * [13] =3D> User-Agent* >> >> * [14] =3D> Cookie* >> >> * [15] =3D> Referer* >> >> * [16] =3D> Remote-Host* >> >> * [17] =3D> Status* >> >> * [18] =3D> Status* >> >> * [19] =3D> $status* >> >> * [20] =3D> Bytes-Sent-X* >> >> * [21] =3D> Reply-Header* >> >> * [22] =3D> Time-Taken-S* >> >> *)* >> >> * * >> >> *=3D=3D=3D=3D* >> >> * * >> >> *>>> Closing analysis engine...............done* >> >> * * >> >> *D:\Inetpub\wwwroot\ows\scripts>* >> >> ***************************************************************** >> >> From what I can see, the "Apache" log variable equivalent of >> 2007-07-01 04:00:33 is %{%Y-%m-%d %H:%M:%S}t, and when I use the >> upload_log.php I get this for the "Date / Time" columns: >> >> *[0] =3D> Connection-Status* >> >> * [1] =3D> Date* >> >> * [2] =3D> Time* >> >> I am using ( *%{%Y-%m-%d %H:%M:%S}t* ) because this is the variable >> formatting I use in our UNIX environment to produce the ( 2007-07-01 >> 04:00:33 ) time formatted output. >> >> Our httpd.conf files have this exact variable definition for >> formatting log output: >> >> *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H >> \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s >> \"%{$status}e\" %B \"%{}o\" %T* >> >> * * >> >> I hope I did not ramble or get confusing. I really want to be able= to >> use this application with my logs. I see a lot of >> functionality/expandability and it uses a SQL database for better >> reporting capabilities. A big PLUS!!! >> >> John A. Ridgway >> >> Systems Analyst II >> >> College Center for Library Automation >> >> 850-922-6044 >> >> jri...@cc... >> >> -----Original Message----- >> From: Michael Papile [mailto:p...@pa...] >> Sent: Wednesday, September 19, 2007 8:47 PM >> To: Dustin Spicuzza >> Cc: John Ridgway; obs...@li... >> Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics >> >> Hi, >> >> You can pretty easily make your own formatter with no programming: >> >> in apache_log_formats.php create a line like this: >> >> define("IIS_W3C",'%h %l %u %t \"%r\" %>s %b \"%{Referrer}i\" >> \"%{User-Agent}i\"'); >> >> Now all of the formatting is not the IIS format (IDK what the IIS >> formatting is), but you can make your own. >> >> Just look at your log output and see what fields are where. >> >> I made my own formatting for NGINX logs which were not apache like. >> So look at your logfield order, and put the %h etc where it appears >> in your log. >> >> Then in your config file put >> >> $cfg['websites']['domain.com']['log_format'] =3D IIS_W3C; >> >> Use the following legend (from apache_log_parser.rb) >> >> 319 '%' =3D> '', >> >> 320 'a' =3D> 'Remote-IP', >> >> 321 'A' =3D> 'Local-IP', >> >> 322 'B' =3D> 'Bytes-Sent-X', >> >> 323 'b' =3D> 'Bytes-Sent', >> >> 324 'c' =3D> 'Connection-Status', // <=3D 1.3 >> >> 325 'C' =3D> 'Cookie', // >=3D 2.0 >> >> 326 'D' =3D> 'Time-Taken-MS', >> >> 327 'e' =3D> 'Env-Var', >> >> 328 'f' =3D> 'Filename', >> >> 329 'h' =3D> 'Remote-Host', >> >> 330 'H' =3D> 'Request-Protocol', >> >> 331 'i' =3D> 'Request-Header', >> >> 332 'I' =3D> 'Bytes-Recieved', // >=3D 2.0 >> >> 333 'l' =3D> 'Remote-Logname', >> >> 334 'm' =3D> 'Request-Method', >> >> 335 'n' =3D> 'Note', >> >> 336 'o' =3D> 'Reply-Header', >> >> 337 'O' =3D> 'Bytes-Sent', // >=3D 2.0 >> >> 338 'p' =3D> 'Port', >> >> 339 'P' =3D> 'Process-Id', // {format} >=3D 2.0 >> >> 340 'q' =3D> 'Query-String', >> >> 341 'r' =3D> 'Request', >> >> 342 's' =3D> 'Status', >> >> 343 't' =3D> 'Time', >> >> 344 'T' =3D> 'Time-Taken-S', >> >> 345 'u' =3D> 'Remote-User', >> >> 346 'U' =3D> 'Request-Path', >> >> 347 'v' =3D> 'Server-Name', >> >> 348 'V' =3D> 'Server-Name-X', >> >> 349 'X' =3D> 'Connection-Status', // >=3D 2.0 >> >> 350 ); >> >> Dustin Spicuzza wrote: >> >> > I'm not familiar with IIS formatted logfiles, so I'm not sure whats >> >> > required to make that work. The log parser uses apache CustomLog >> >> > formatting to parse the logfile, so its conceivable you could use >> those >> >> > directives to match the format of the logs. >> >> > >> >> > The apache_log_parser.php does all the work, and is called mostly= in >> >> > include/analysis.inc.php .. the parse() function returns an array, >> which >> >> > is then fed into the database. I'm not sure what (if any) modificat= ion >> >> > would be needed for this to work with the IIS logs. Maybe you could >> send >> >> > a few lines of one? >> >> > >> >> > Hope that helps! >> >> > >> >> > Dustin >> >> > >> >> > John Ridgway wrote: >> >> > >> >> >> Hello support, >> >> >> >> >> >> I have recently found your PHP application and decided to give >> >> >> it a try. I know this is still a very new beta project, and I am >> >> >> looking forward to see it work, but I am currently using W3C Exten= ded >> >> >> (IIS) formatted logfiles and I realized that your app does not know >> >> >> how to parse these. >> >> >> >> >> >> I began to try to figure out what to change within the files >> >> >> (apache_log_formats.php and apache_log_parser.php) to get IIS >> >> >> formatted files to work. Not an easy task.... >> >> >> >> >> >> So now I write wondering if there is already something >> >> >> prepared that will take care of this log formatting syntax? >> >> >> >> >> >> Here's hoping!! >> >> >> >> >> >> Thanks! >> >> >> >> >> >> John A. Ridgway >> >> >> College Center for Library Automation >> >> >> Systems Analyst II >> >> >> jri...@cc... >> >> >> >> >> >> >> ---------------------------------------------------------------------= --- >> >> >> >> >> >> >> ---------------------------------------------------------------------= ---- >> >> >> >> This SF.net email is sponsored by: Microsoft >> >> >> Defy all challenges. Microsoft(R) Visual Studio 2005. >> >> >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> >> >> >> ---------------------------------------------------------------------= --- >> >> >> >> >> >> _______________________________________________ >> >> >> Obsessive-compulsive mailing list >> >> >> Obs...@li... >> >> >> https://lists.sourceforge.net/lists/listinfo/obsessive-compulsive >> >> >> >> >> >> >> >> > >> >> > >> >> > >> > -- Innovation is just a problem away. |
From: Dustin S. <du...@vi...> - 2007-09-20 04:33:57
|
Oh I missed the second part of your comment. I'm almost positive I don't use that function anyways. But, I didn't write that class either.... Dustin Michael Papile wrote: > It appears the date is going to ows in 01/Jan/2007 format, so it is > not as simple as making a regex for it. You have to manipulate the > date to get it in the same format. There are a few options to do this. > I am not sure where this should go. The options are that you can: > 1. preprocess the line with the below function to make it appear like > the apache date.. > 2. have the function that changes the date into a timestamp detect > YYYY-MM-DD format and use that.** > * > * > <?php > function IIS_to_apache_date($line){ > $matches = explode(' ',$line); > $date = array_shift($matches); > $time = array_shift($matches); > $rest_of_line = implode(' ',$matches); > $date = str_replace('-','',$date); > $timestamp = strtotime($date); > $date = date ( 'd/M/Y' ,$timestamp ); > return "[$date:$time -400] $rest_of_line"; > } > > > PS for dustin: > The whole logtime _to_timestamp function can be way easier :) > public function logtime_to_timestamp($logtime){ > preg_match('/\[([^:]+):(\d+:\d+:\d+)[^\]]*\]/',$logtime,$matches); > $date = str_replace('/',' ',$matches[1]); > $time = $matches[2]; > return strtotime("$date $time"); > } > > > #here is one that will work for the IIS timestamp too > public function logtime_to_timestamp($logtime){ > if(preg_match('/\d+-\d+-\d+.*/',$logtime)){ > $matches = explode(' ',$logtime); > $date = array_shift($matches); > $date = str_replace('-','',$date); > $time = array_shift($matches); > } > else{ > preg_match('/\[([^:]+):(\d+:\d+:\d+)[^\]]*\]/',$logtime,$matches); > $date = str_replace('/',' ',$matches[1]); > $time = $matches[2]; > } > return strtotime("$date $time"); > } > ~ > John Ridgway wrote: >> >> Michael and Dustin, >> >> Thank you for the very quick response. I am sorry I did not go into >> very much detail, but I didn't want to expound on what I had done, if >> the solution was already at hand. >> >> Michael, as I began to investigate how the "Apache" logs are parsed, >> I did find the apache_log_formats.php file and add what I thought >> were appropriate entries for both *IIS5* and *IIS6*. >> >> define("CCLA_LOG_FORMAT_V5",'%{%Y-%m-%d %H:%M:%S}t %a %u %P %v %A %p >> %m %U %q %s \"%{$status}e\" %B - %T %H %h \"%{User-Agent}i\" >> \"%{Cookie}i\" \"%{Referer}i\"'); >> >> define("CCLA_LOG_FORMAT_V6",'%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q >> %p %u %a %H \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s >> %>s \"%{$status}e\" %B \"%{}o\" %T'); >> >> When I did this and then ran the upload_log.php against this log >> content: >> >> #Software: Microsoft Internet Information Services 6.0 >> >> #Version: 1.0 >> >> #Date: 2007-07-01 04:00:33 >> >> #Fields: date time s-sitename s-computername s-ip cs-method >> cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version >> cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus >> sc-win32-status sc-bytes cs-bytes time-taken >> >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET /index.asp >> - 80 - 76.20.156.11 HTTP/1.1 >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) >> - >> http://search.yahoo.com/search?p=flordia+community+colleges&fr=yfp-t-501&toggle=1&cop=mss&ei=UTF-8 >> www.cclaflorida.org 200 0 0 13102 343 343 >> >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET >> /images/cclabak.gif - 80 - 76.20.156.11 HTTP/1.1 >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) >> ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH >> http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 296 345 93 >> >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET >> /images/banner_master_01.gif - 80 - 76.20.156.11 HTTP/1.1 >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) >> ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH >> http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 1205 354 78 >> >> I got this output: >> >> ***************************************************************** >> >> *D:\Inetpub\wwwroot\ows\scripts>php upload_log.php >> d:\logs\Jul-Sep\ex070701.log claflorida.org debug* >> >> * * >> >> *>>> Initializing analysis engine................done.* >> >> *>>> Initializing analysis plugins.....done.* >> >> *>>> Initializing rejection plugins...done.* >> >> *>>> Initializing log parser... done.* >> >> * * >> >> *==== Debug Info ====* >> >> *Log Format:* >> >> * * >> >> *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H >> \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s >> \"%{$status}e\" %B \"%{}o\" %T* >> >> * * >> >> *Using regular expression:* >> >> * * >> >> */^(\S*) \[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) >> (\S*) (\S*) (\S*) (\s*\S*\s*) (\S*) (\S*) \"(.*?)\" \"(.*?)\" >> \"(.*?)\" (\S*) (\S*) (\S*) \"(.*?)\" (\S*) \"(.*?)\" (\S*)$/* >> >> * * >> >> *Matching fields (case-sensitive fields present in the log_format >> string):* >> >> * * >> >> *Array* >> >> *(* >> >> * [0] => Connection-Status **ß---- **Seems to be coming from the >> %{%Y-%m-%d %H:%M:%S}t Apache Time column formatting… I need a way to >> read IIS time formatting* >> >> * [1] => Date* >> >> * [2] => Time* >> >> * [3] => Process-Id* >> >> * [4] => Server-Name* >> >> * [5] => Local-IP* >> >> * [6] => Request-Method* >> >> * [7] => Request-Path* >> >> * [8] => Query-String* >> >> * [9] => Port* >> >> * [10] => Remote-User* >> >> * [11] => Remote-IP* >> >> * [12] => Request-Protocol* >> >> * [13] => User-Agent* >> >> * [14] => Cookie* >> >> * [15] => Referer* >> >> * [16] => Remote-Host* >> >> * [17] => Status* >> >> * [18] => Status* >> >> * [19] => $status* >> >> * [20] => Bytes-Sent-X* >> >> * [21] => Reply-Header* >> >> * [22] => Time-Taken-S* >> >> *)* >> >> * * >> >> *====* >> >> * * >> >> *>>> Closing analysis engine...............done* >> >> * * >> >> *D:\Inetpub\wwwroot\ows\scripts>* >> >> ***************************************************************** >> >> From what I can see, the "Apache" log variable equivalent of >> 2007-07-01 04:00:33 is %{%Y-%m-%d %H:%M:%S}t, and when I use the >> upload_log.php I get this for the “Date / Time” columns: >> >> *[0] => Connection-Status* >> >> * [1] => Date* >> >> * [2] => Time* >> >> I am using ( *%{%Y-%m-%d %H:%M:%S}t* ) because this is the variable >> formatting I use in our UNIX environment to produce the ( 2007-07-01 >> 04:00:33 ) time formatted output. >> >> Our httpd.conf files have this exact variable definition for >> formatting log output: >> >> *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H >> \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s >> \"%{$status}e\" %B \"%{}o\" %T* >> >> * * >> >> I hope I did not ramble or get confusing. I really want to be able to >> use this application with my logs. I see a lot of >> functionality/expandability and it uses a SQL database for better >> reporting capabilities. A big PLUS!!! >> >> John A. Ridgway >> >> Systems Analyst II >> >> College Center for Library Automation >> >> 850-922-6044 >> >> jri...@cc... >> >> -----Original Message----- >> From: Michael Papile [mailto:p...@pa...] >> Sent: Wednesday, September 19, 2007 8:47 PM >> To: Dustin Spicuzza >> Cc: John Ridgway; obs...@li... >> Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics >> >> Hi, >> >> You can pretty easily make your own formatter with no programming: >> >> in apache_log_formats.php create a line like this: >> >> define("IIS_W3C",'%h %l %u %t \"%r\" %>s %b \"%{Referrer}i\" >> \"%{User-Agent}i\"'); >> >> Now all of the formatting is not the IIS format (IDK what the IIS >> formatting is), but you can make your own. >> >> Just look at your log output and see what fields are where. >> >> I made my own formatting for NGINX logs which were not apache like. >> So look at your logfield order, and put the %h etc where it appears >> in your log. >> >> Then in your config file put >> >> $cfg['websites']['domain.com']['log_format'] = IIS_W3C; >> >> Use the following legend (from apache_log_parser.rb) >> >> 319 '%' => '', >> >> 320 'a' => 'Remote-IP', >> >> 321 'A' => 'Local-IP', >> >> 322 'B' => 'Bytes-Sent-X', >> >> 323 'b' => 'Bytes-Sent', >> >> 324 'c' => 'Connection-Status', // <= 1.3 >> >> 325 'C' => 'Cookie', // >= 2.0 >> >> 326 'D' => 'Time-Taken-MS', >> >> 327 'e' => 'Env-Var', >> >> 328 'f' => 'Filename', >> >> 329 'h' => 'Remote-Host', >> >> 330 'H' => 'Request-Protocol', >> >> 331 'i' => 'Request-Header', >> >> 332 'I' => 'Bytes-Recieved', // >= 2.0 >> >> 333 'l' => 'Remote-Logname', >> >> 334 'm' => 'Request-Method', >> >> 335 'n' => 'Note', >> >> 336 'o' => 'Reply-Header', >> >> 337 'O' => 'Bytes-Sent', // >= 2.0 >> >> 338 'p' => 'Port', >> >> 339 'P' => 'Process-Id', // {format} >= 2.0 >> >> 340 'q' => 'Query-String', >> >> 341 'r' => 'Request', >> >> 342 's' => 'Status', >> >> 343 't' => 'Time', >> >> 344 'T' => 'Time-Taken-S', >> >> 345 'u' => 'Remote-User', >> >> 346 'U' => 'Request-Path', >> >> 347 'v' => 'Server-Name', >> >> 348 'V' => 'Server-Name-X', >> >> 349 'X' => 'Connection-Status', // >= 2.0 >> >> 350 ); >> >> Dustin Spicuzza wrote: >> >> > I'm not familiar with IIS formatted logfiles, so I'm not sure whats >> >> > required to make that work. The log parser uses apache CustomLog >> >> > formatting to parse the logfile, so its conceivable you could use >> those >> >> > directives to match the format of the logs. >> >> > >> >> > The apache_log_parser.php does all the work, and is called mostly in >> >> > include/analysis.inc.php .. the parse() function returns an array, >> which >> >> > is then fed into the database. I'm not sure what (if any) modification >> >> > would be needed for this to work with the IIS logs. Maybe you could >> send >> >> > a few lines of one? >> >> > >> >> > Hope that helps! >> >> > >> >> > Dustin >> >> > >> >> > John Ridgway wrote: >> >> > >> >> >> Hello support, >> >> >> >> >> >> I have recently found your PHP application and decided to give >> >> >> it a try. I know this is still a very new beta project, and I am >> >> >> looking forward to see it work, but I am currently using W3C Extended >> >> >> (IIS) formatted logfiles and I realized that your app does not know >> >> >> how to parse these. >> >> >> >> >> >> I began to try to figure out what to change within the files >> >> >> (apache_log_formats.php and apache_log_parser.php) to get IIS >> >> >> formatted files to work. Not an easy task.... >> >> >> >> >> >> So now I write wondering if there is already something >> >> >> prepared that will take care of this log formatting syntax? >> >> >> >> >> >> Here's hoping!! >> >> >> >> >> >> Thanks! >> >> >> >> >> >> John A. Ridgway >> >> >> College Center for Library Automation >> >> >> Systems Analyst II >> >> >> jri...@cc... >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> >> ------------------------------------------------------------------------- >> >> >> >> This SF.net email is sponsored by: Microsoft >> >> >> Defy all challenges. Microsoft(R) Visual Studio 2005. >> >> >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> _______________________________________________ >> >> >> Obsessive-compulsive mailing list >> >> >> Obs...@li... >> >> >> https://lists.sourceforge.net/lists/listinfo/obsessive-compulsive >> >> >> >> >> >> >> >> > >> >> > >> >> > >> > -- Innovation is just a problem away. |
From: Dustin S. <du...@vi...> - 2007-09-20 04:31:48
|
True, but OWS uses strtotime to interpret it, so it removes the "/" and the time really goes to it as 01 Jan 2007 instead. So, anything that strtotime can interpret should be good. Dustin Michael Papile wrote: > It appears the date is going to ows in 01/Jan/2007 format, so it is > not as simple as making a regex for it. You have to manipulate the > date to get it in the same format. There are a few options to do this. > I am not sure where this should go. The options are that you can: > 1. preprocess the line with the below function to make it appear like > the apache date.. > 2. have the function that changes the date into a timestamp detect > YYYY-MM-DD format and use that.** > * > * > <?php > function IIS_to_apache_date($line){ > $matches = explode(' ',$line); > $date = array_shift($matches); > $time = array_shift($matches); > $rest_of_line = implode(' ',$matches); > $date = str_replace('-','',$date); > $timestamp = strtotime($date); > $date = date ( 'd/M/Y' ,$timestamp ); > return "[$date:$time -400] $rest_of_line"; > } > > > PS for dustin: > The whole logtime _to_timestamp function can be way easier :) > public function logtime_to_timestamp($logtime){ > preg_match('/\[([^:]+):(\d+:\d+:\d+)[^\]]*\]/',$logtime,$matches); > $date = str_replace('/',' ',$matches[1]); > $time = $matches[2]; > return strtotime("$date $time"); > } > > > #here is one that will work for the IIS timestamp too > public function logtime_to_timestamp($logtime){ > if(preg_match('/\d+-\d+-\d+.*/',$logtime)){ > $matches = explode(' ',$logtime); > $date = array_shift($matches); > $date = str_replace('-','',$date); > $time = array_shift($matches); > } > else{ > preg_match('/\[([^:]+):(\d+:\d+:\d+)[^\]]*\]/',$logtime,$matches); > $date = str_replace('/',' ',$matches[1]); > $time = $matches[2]; > } > return strtotime("$date $time"); > } > ~ > John Ridgway wrote: >> >> Michael and Dustin, >> >> Thank you for the very quick response. I am sorry I did not go into >> very much detail, but I didn't want to expound on what I had done, if >> the solution was already at hand. >> >> Michael, as I began to investigate how the "Apache" logs are parsed, >> I did find the apache_log_formats.php file and add what I thought >> were appropriate entries for both *IIS5* and *IIS6*. >> >> define("CCLA_LOG_FORMAT_V5",'%{%Y-%m-%d %H:%M:%S}t %a %u %P %v %A %p >> %m %U %q %s \"%{$status}e\" %B - %T %H %h \"%{User-Agent}i\" >> \"%{Cookie}i\" \"%{Referer}i\"'); >> >> define("CCLA_LOG_FORMAT_V6",'%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q >> %p %u %a %H \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s >> %>s \"%{$status}e\" %B \"%{}o\" %T'); >> >> When I did this and then ran the upload_log.php against this log >> content: >> >> #Software: Microsoft Internet Information Services 6.0 >> >> #Version: 1.0 >> >> #Date: 2007-07-01 04:00:33 >> >> #Fields: date time s-sitename s-computername s-ip cs-method >> cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version >> cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus >> sc-win32-status sc-bytes cs-bytes time-taken >> >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET /index.asp >> - 80 - 76.20.156.11 HTTP/1.1 >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) >> - >> http://search.yahoo.com/search?p=flordia+community+colleges&fr=yfp-t-501&toggle=1&cop=mss&ei=UTF-8 >> www.cclaflorida.org 200 0 0 13102 343 343 >> >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET >> /images/cclabak.gif - 80 - 76.20.156.11 HTTP/1.1 >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) >> ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH >> http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 296 345 93 >> >> 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET >> /images/banner_master_01.gif - 80 - 76.20.156.11 HTTP/1.1 >> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) >> ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH >> http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 1205 354 78 >> >> I got this output: >> >> ***************************************************************** >> >> *D:\Inetpub\wwwroot\ows\scripts>php upload_log.php >> d:\logs\Jul-Sep\ex070701.log claflorida.org debug* >> >> * * >> >> *>>> Initializing analysis engine................done.* >> >> *>>> Initializing analysis plugins.....done.* >> >> *>>> Initializing rejection plugins...done.* >> >> *>>> Initializing log parser... done.* >> >> * * >> >> *==== Debug Info ====* >> >> *Log Format:* >> >> * * >> >> *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H >> \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s >> \"%{$status}e\" %B \"%{}o\" %T* >> >> * * >> >> *Using regular expression:* >> >> * * >> >> */^(\S*) \[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) >> (\S*) (\S*) (\S*) (\s*\S*\s*) (\S*) (\S*) \"(.*?)\" \"(.*?)\" >> \"(.*?)\" (\S*) (\S*) (\S*) \"(.*?)\" (\S*) \"(.*?)\" (\S*)$/* >> >> * * >> >> *Matching fields (case-sensitive fields present in the log_format >> string):* >> >> * * >> >> *Array* >> >> *(* >> >> * [0] => Connection-Status **ß---- **Seems to be coming from the >> %{%Y-%m-%d %H:%M:%S}t Apache Time column formatting… I need a way to >> read IIS time formatting* >> >> * [1] => Date* >> >> * [2] => Time* >> >> * [3] => Process-Id* >> >> * [4] => Server-Name* >> >> * [5] => Local-IP* >> >> * [6] => Request-Method* >> >> * [7] => Request-Path* >> >> * [8] => Query-String* >> >> * [9] => Port* >> >> * [10] => Remote-User* >> >> * [11] => Remote-IP* >> >> * [12] => Request-Protocol* >> >> * [13] => User-Agent* >> >> * [14] => Cookie* >> >> * [15] => Referer* >> >> * [16] => Remote-Host* >> >> * [17] => Status* >> >> * [18] => Status* >> >> * [19] => $status* >> >> * [20] => Bytes-Sent-X* >> >> * [21] => Reply-Header* >> >> * [22] => Time-Taken-S* >> >> *)* >> >> * * >> >> *====* >> >> * * >> >> *>>> Closing analysis engine...............done* >> >> * * >> >> *D:\Inetpub\wwwroot\ows\scripts>* >> >> ***************************************************************** >> >> From what I can see, the "Apache" log variable equivalent of >> 2007-07-01 04:00:33 is %{%Y-%m-%d %H:%M:%S}t, and when I use the >> upload_log.php I get this for the “Date / Time” columns: >> >> *[0] => Connection-Status* >> >> * [1] => Date* >> >> * [2] => Time* >> >> I am using ( *%{%Y-%m-%d %H:%M:%S}t* ) because this is the variable >> formatting I use in our UNIX environment to produce the ( 2007-07-01 >> 04:00:33 ) time formatted output. >> >> Our httpd.conf files have this exact variable definition for >> formatting log output: >> >> *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H >> \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s >> \"%{$status}e\" %B \"%{}o\" %T* >> >> * * >> >> I hope I did not ramble or get confusing. I really want to be able to >> use this application with my logs. I see a lot of >> functionality/expandability and it uses a SQL database for better >> reporting capabilities. A big PLUS!!! >> >> John A. Ridgway >> >> Systems Analyst II >> >> College Center for Library Automation >> >> 850-922-6044 >> >> jri...@cc... >> >> -----Original Message----- >> From: Michael Papile [mailto:p...@pa...] >> Sent: Wednesday, September 19, 2007 8:47 PM >> To: Dustin Spicuzza >> Cc: John Ridgway; obs...@li... >> Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics >> >> Hi, >> >> You can pretty easily make your own formatter with no programming: >> >> in apache_log_formats.php create a line like this: >> >> define("IIS_W3C",'%h %l %u %t \"%r\" %>s %b \"%{Referrer}i\" >> \"%{User-Agent}i\"'); >> >> Now all of the formatting is not the IIS format (IDK what the IIS >> formatting is), but you can make your own. >> >> Just look at your log output and see what fields are where. >> >> I made my own formatting for NGINX logs which were not apache like. >> So look at your logfield order, and put the %h etc where it appears >> in your log. >> >> Then in your config file put >> >> $cfg['websites']['domain.com']['log_format'] = IIS_W3C; >> >> Use the following legend (from apache_log_parser.rb) >> >> 319 '%' => '', >> >> 320 'a' => 'Remote-IP', >> >> 321 'A' => 'Local-IP', >> >> 322 'B' => 'Bytes-Sent-X', >> >> 323 'b' => 'Bytes-Sent', >> >> 324 'c' => 'Connection-Status', // <= 1.3 >> >> 325 'C' => 'Cookie', // >= 2.0 >> >> 326 'D' => 'Time-Taken-MS', >> >> 327 'e' => 'Env-Var', >> >> 328 'f' => 'Filename', >> >> 329 'h' => 'Remote-Host', >> >> 330 'H' => 'Request-Protocol', >> >> 331 'i' => 'Request-Header', >> >> 332 'I' => 'Bytes-Recieved', // >= 2.0 >> >> 333 'l' => 'Remote-Logname', >> >> 334 'm' => 'Request-Method', >> >> 335 'n' => 'Note', >> >> 336 'o' => 'Reply-Header', >> >> 337 'O' => 'Bytes-Sent', // >= 2.0 >> >> 338 'p' => 'Port', >> >> 339 'P' => 'Process-Id', // {format} >= 2.0 >> >> 340 'q' => 'Query-String', >> >> 341 'r' => 'Request', >> >> 342 's' => 'Status', >> >> 343 't' => 'Time', >> >> 344 'T' => 'Time-Taken-S', >> >> 345 'u' => 'Remote-User', >> >> 346 'U' => 'Request-Path', >> >> 347 'v' => 'Server-Name', >> >> 348 'V' => 'Server-Name-X', >> >> 349 'X' => 'Connection-Status', // >= 2.0 >> >> 350 ); >> >> Dustin Spicuzza wrote: >> >> > I'm not familiar with IIS formatted logfiles, so I'm not sure whats >> >> > required to make that work. The log parser uses apache CustomLog >> >> > formatting to parse the logfile, so its conceivable you could use >> those >> >> > directives to match the format of the logs. >> >> > >> >> > The apache_log_parser.php does all the work, and is called mostly in >> >> > include/analysis.inc.php .. the parse() function returns an array, >> which >> >> > is then fed into the database. I'm not sure what (if any) modification >> >> > would be needed for this to work with the IIS logs. Maybe you could >> send >> >> > a few lines of one? >> >> > >> >> > Hope that helps! >> >> > >> >> > Dustin >> >> > >> >> > John Ridgway wrote: >> >> > >> >> >> Hello support, >> >> >> >> >> >> I have recently found your PHP application and decided to give >> >> >> it a try. I know this is still a very new beta project, and I am >> >> >> looking forward to see it work, but I am currently using W3C Extended >> >> >> (IIS) formatted logfiles and I realized that your app does not know >> >> >> how to parse these. >> >> >> >> >> >> I began to try to figure out what to change within the files >> >> >> (apache_log_formats.php and apache_log_parser.php) to get IIS >> >> >> formatted files to work. Not an easy task.... >> >> >> >> >> >> So now I write wondering if there is already something >> >> >> prepared that will take care of this log formatting syntax? >> >> >> >> >> >> Here's hoping!! >> >> >> >> >> >> Thanks! >> >> >> >> >> >> John A. Ridgway >> >> >> College Center for Library Automation >> >> >> Systems Analyst II >> >> >> jri...@cc... >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> >> ------------------------------------------------------------------------- >> >> >> >> This SF.net email is sponsored by: Microsoft >> >> >> Defy all challenges. Microsoft(R) Visual Studio 2005. >> >> >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> _______________________________________________ >> >> >> Obsessive-compulsive mailing list >> >> >> Obs...@li... >> >> >> https://lists.sourceforge.net/lists/listinfo/obsessive-compulsive >> >> >> >> >> >> >> >> > >> >> > >> >> > >> > -- Innovation is just a problem away. |
From: Michael P. <p...@pa...> - 2007-09-20 04:30:10
|
It appears the date is going to ows in 01/Jan/2007 format, so it is not as simple as making a regex for it. You have to manipulate the date to get it in the same format. There are a few options to do this. I am not sure where this should go. The options are that you can: 1. preprocess the line with the below function to make it appear like the apache date.. 2. have the function that changes the date into a timestamp detect YYYY-MM-DD format and use that.** * * <?php function IIS_to_apache_date($line){ $matches = explode(' ',$line); $date = array_shift($matches); $time = array_shift($matches); $rest_of_line = implode(' ',$matches); $date = str_replace('-','',$date); $timestamp = strtotime($date); $date = date ( 'd/M/Y' ,$timestamp ); return "[$date:$time -400] $rest_of_line"; } PS for dustin: The whole logtime _to_timestamp function can be way easier :) public function logtime_to_timestamp($logtime){ preg_match('/\[([^:]+):(\d+:\d+:\d+)[^\]]*\]/',$logtime,$matches); $date = str_replace('/',' ',$matches[1]); $time = $matches[2]; return strtotime("$date $time"); } #here is one that will work for the IIS timestamp too public function logtime_to_timestamp($logtime){ if(preg_match('/\d+-\d+-\d+.*/',$logtime)){ $matches = explode(' ',$logtime); $date = array_shift($matches); $date = str_replace('-','',$date); $time = array_shift($matches); } else{ preg_match('/\[([^:]+):(\d+:\d+:\d+)[^\]]*\]/',$logtime,$matches); $date = str_replace('/',' ',$matches[1]); $time = $matches[2]; } return strtotime("$date $time"); } ~ John Ridgway wrote: > > Michael and Dustin, > > Thank you for the very quick response. I am sorry I did not go into > very much detail, but I didn't want to expound on what I had done, if > the solution was already at hand. > > Michael, as I began to investigate how the "Apache" logs are parsed, I > did find the apache_log_formats.php file and add what I thought were > appropriate entries for both *IIS5* and *IIS6*. > > define("CCLA_LOG_FORMAT_V5",'%{%Y-%m-%d %H:%M:%S}t %a %u %P %v %A %p > %m %U %q %s \"%{$status}e\" %B - %T %H %h \"%{User-Agent}i\" > \"%{Cookie}i\" \"%{Referer}i\"'); > > define("CCLA_LOG_FORMAT_V6",'%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q > %p %u %a %H \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s > %>s \"%{$status}e\" %B \"%{}o\" %T'); > > When I did this and then ran the upload_log.php against this log content: > > #Software: Microsoft Internet Information Services 6.0 > > #Version: 1.0 > > #Date: 2007-07-01 04:00:33 > > #Fields: date time s-sitename s-computername s-ip cs-method > cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version > cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus > sc-win32-status sc-bytes cs-bytes time-taken > > 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET /index.asp - > 80 - 76.20.156.11 HTTP/1.1 > Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) > - > http://search.yahoo.com/search?p=flordia+community+colleges&fr=yfp-t-501&toggle=1&cop=mss&ei=UTF-8 > www.cclaflorida.org 200 0 0 13102 343 343 > > 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET > /images/cclabak.gif - 80 - 76.20.156.11 HTTP/1.1 > Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) > ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH > http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 296 345 93 > > 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET > /images/banner_master_01.gif - 80 - 76.20.156.11 HTTP/1.1 > Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) > ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH > http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 1205 354 78 > > I got this output: > > ***************************************************************** > > *D:\Inetpub\wwwroot\ows\scripts>php upload_log.php > d:\logs\Jul-Sep\ex070701.log claflorida.org debug* > > * * > > *>>> Initializing analysis engine................done.* > > *>>> Initializing analysis plugins.....done.* > > *>>> Initializing rejection plugins...done.* > > *>>> Initializing log parser... done.* > > * * > > *==== Debug Info ====* > > *Log Format:* > > * * > > *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H > \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s > \"%{$status}e\" %B \"%{}o\" %T* > > * * > > *Using regular expression:* > > * * > > */^(\S*) \[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) > (\S*) (\S*) (\S*) (\s*\S*\s*) (\S*) (\S*) \"(.*?)\" \"(.*?)\" > \"(.*?)\" (\S*) (\S*) (\S*) \"(.*?)\" (\S*) \"(.*?)\" (\S*)$/* > > * * > > *Matching fields (case-sensitive fields present in the log_format > string):* > > * * > > *Array* > > *(* > > * [0] => Connection-Status **ß---- **Seems to be coming from the > %{%Y-%m-%d %H:%M:%S}t Apache Time column formatting… I need a way to > read IIS time formatting* > > * [1] => Date* > > * [2] => Time* > > * [3] => Process-Id* > > * [4] => Server-Name* > > * [5] => Local-IP* > > * [6] => Request-Method* > > * [7] => Request-Path* > > * [8] => Query-String* > > * [9] => Port* > > * [10] => Remote-User* > > * [11] => Remote-IP* > > * [12] => Request-Protocol* > > * [13] => User-Agent* > > * [14] => Cookie* > > * [15] => Referer* > > * [16] => Remote-Host* > > * [17] => Status* > > * [18] => Status* > > * [19] => $status* > > * [20] => Bytes-Sent-X* > > * [21] => Reply-Header* > > * [22] => Time-Taken-S* > > *)* > > * * > > *====* > > * * > > *>>> Closing analysis engine...............done* > > * * > > *D:\Inetpub\wwwroot\ows\scripts>* > > ***************************************************************** > > From what I can see, the "Apache" log variable equivalent of > 2007-07-01 04:00:33 is %{%Y-%m-%d %H:%M:%S}t, and when I use the > upload_log.php I get this for the “Date / Time” columns: > > *[0] => Connection-Status* > > * [1] => Date* > > * [2] => Time* > > I am using ( *%{%Y-%m-%d %H:%M:%S}t* ) because this is the variable > formatting I use in our UNIX environment to produce the ( 2007-07-01 > 04:00:33 ) time formatted output. > > Our httpd.conf files have this exact variable definition for > formatting log output: > > *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H > \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s > \"%{$status}e\" %B \"%{}o\" %T* > > * * > > I hope I did not ramble or get confusing. I really want to be able to > use this application with my logs. I see a lot of > functionality/expandability and it uses a SQL database for better > reporting capabilities. A big PLUS!!! > > John A. Ridgway > > Systems Analyst II > > College Center for Library Automation > > 850-922-6044 > > jri...@cc... > > -----Original Message----- > From: Michael Papile [mailto:p...@pa...] > Sent: Wednesday, September 19, 2007 8:47 PM > To: Dustin Spicuzza > Cc: John Ridgway; obs...@li... > Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics > > Hi, > > You can pretty easily make your own formatter with no programming: > > in apache_log_formats.php create a line like this: > > define("IIS_W3C",'%h %l %u %t \"%r\" %>s %b \"%{Referrer}i\" > \"%{User-Agent}i\"'); > > Now all of the formatting is not the IIS format (IDK what the IIS > formatting is), but you can make your own. > > Just look at your log output and see what fields are where. > > I made my own formatting for NGINX logs which were not apache like. So > look at your logfield order, and put the %h etc where it appears in > your log. > > Then in your config file put > > $cfg['websites']['domain.com']['log_format'] = IIS_W3C; > > Use the following legend (from apache_log_parser.rb) > > 319 '%' => '', > > 320 'a' => 'Remote-IP', > > 321 'A' => 'Local-IP', > > 322 'B' => 'Bytes-Sent-X', > > 323 'b' => 'Bytes-Sent', > > 324 'c' => 'Connection-Status', // <= 1.3 > > 325 'C' => 'Cookie', // >= 2.0 > > 326 'D' => 'Time-Taken-MS', > > 327 'e' => 'Env-Var', > > 328 'f' => 'Filename', > > 329 'h' => 'Remote-Host', > > 330 'H' => 'Request-Protocol', > > 331 'i' => 'Request-Header', > > 332 'I' => 'Bytes-Recieved', // >= 2.0 > > 333 'l' => 'Remote-Logname', > > 334 'm' => 'Request-Method', > > 335 'n' => 'Note', > > 336 'o' => 'Reply-Header', > > 337 'O' => 'Bytes-Sent', // >= 2.0 > > 338 'p' => 'Port', > > 339 'P' => 'Process-Id', // {format} >= 2.0 > > 340 'q' => 'Query-String', > > 341 'r' => 'Request', > > 342 's' => 'Status', > > 343 't' => 'Time', > > 344 'T' => 'Time-Taken-S', > > 345 'u' => 'Remote-User', > > 346 'U' => 'Request-Path', > > 347 'v' => 'Server-Name', > > 348 'V' => 'Server-Name-X', > > 349 'X' => 'Connection-Status', // >= 2.0 > > 350 ); > > Dustin Spicuzza wrote: > > > I'm not familiar with IIS formatted logfiles, so I'm not sure whats > > > required to make that work. The log parser uses apache CustomLog > > > formatting to parse the logfile, so its conceivable you could use those > > > directives to match the format of the logs. > > > > > > The apache_log_parser.php does all the work, and is called mostly in > > > include/analysis.inc.php .. the parse() function returns an array, which > > > is then fed into the database. I'm not sure what (if any) modification > > > would be needed for this to work with the IIS logs. Maybe you could send > > > a few lines of one? > > > > > > Hope that helps! > > > > > > Dustin > > > > > > John Ridgway wrote: > > > > > >> Hello support, > > >> > > >> I have recently found your PHP application and decided to give > > >> it a try. I know this is still a very new beta project, and I am > > >> looking forward to see it work, but I am currently using W3C Extended > > >> (IIS) formatted logfiles and I realized that your app does not know > > >> how to parse these. > > >> > > >> I began to try to figure out what to change within the files > > >> (apache_log_formats.php and apache_log_parser.php) to get IIS > > >> formatted files to work. Not an easy task.... > > >> > > >> So now I write wondering if there is already something > > >> prepared that will take care of this log formatting syntax? > > >> > > >> Here's hoping!! > > >> > > >> Thanks! > > >> > > >> John A. Ridgway > > >> College Center for Library Automation > > >> Systems Analyst II > > >> jri...@cc... > > >> > > >> ------------------------------------------------------------------------ > > >> > > >> > ------------------------------------------------------------------------- > > >> This SF.net email is sponsored by: Microsoft > > >> Defy all challenges. Microsoft(R) Visual Studio 2005. > > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > >> ------------------------------------------------------------------------ > > >> > > >> _______________________________________________ > > >> Obsessive-compulsive mailing list > > >> Obs...@li... > > >> https://lists.sourceforge.net/lists/listinfo/obsessive-compulsive > > >> > > >> > > > > > > > > > > |
From: Dustin S. <du...@vi...> - 2007-09-20 04:03:06
|
Michael, You said exactly what I was trying to say, thanks. Also, you should share the NGINX logfile format string with us... so we can add it to the default distribution. :) John, You need to add a new time specifier, or modify the existing one to not be apache-centric. The %t that is already there is strictly interpreting apache timestamps. Check out lines 209-220... the regular expression it generates is this: '\[([^:]+):(\d+:\d+:\d+ [^\]]+)\]' Notice the brackets on each end. The format string parser is rather simple, and can only handle a few special cases, so you'll need to add that in. So, maybe just add a new identifier IIS-Time or something like that that will put the correct regular expression. I'm rather busy atm, but I might be able to get to it this weekend if you can't get it working by then. If you do get it working, I'll add it to the default distribution of OWS. :) Dustin John Ridgway wrote: > > Michael and Dustin, > > Thank you for the very quick response. I am sorry I did not go into > very much detail, but I didn't want to expound on what I had done, if > the solution was already at hand. > > Michael, as I began to investigate how the "Apache" logs are parsed, I > did find the apache_log_formats.php file and add what I thought were > appropriate entries for both *IIS5* and *IIS6*. > > define("CCLA_LOG_FORMAT_V5",'%{%Y-%m-%d %H:%M:%S}t %a %u %P %v %A %p > %m %U %q %s \"%{$status}e\" %B - %T %H %h \"%{User-Agent}i\" > \"%{Cookie}i\" \"%{Referer}i\"'); > > define("CCLA_LOG_FORMAT_V6",'%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q > %p %u %a %H \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s > %>s \"%{$status}e\" %B \"%{}o\" %T'); > > When I did this and then ran the upload_log.php against this log content: > > #Software: Microsoft Internet Information Services 6.0 > > #Version: 1.0 > > #Date: 2007-07-01 04:00:33 > > #Fields: date time s-sitename s-computername s-ip cs-method > cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version > cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus > sc-win32-status sc-bytes cs-bytes time-taken > > 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET /index.asp - > 80 - 76.20.156.11 HTTP/1.1 > Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) > - > http://search.yahoo.com/search?p=flordia+community+colleges&fr=yfp-t-501&toggle=1&cop=mss&ei=UTF-8 > www.cclaflorida.org 200 0 0 13102 343 343 > > 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET > /images/cclabak.gif - 80 - 76.20.156.11 HTTP/1.1 > Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) > ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH > http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 296 345 93 > > 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET > /images/banner_master_01.gif - 80 - 76.20.156.11 HTTP/1.1 > Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) > ASPSESSIONIDAAQATSBD=AEIGNNADIJNMJDCFEEHDLNLH > http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 1205 354 78 > > I got this output: > > ***************************************************************** > > *D:\Inetpub\wwwroot\ows\scripts>php upload_log.php > d:\logs\Jul-Sep\ex070701.log claflorida.org debug* > > * * > > *>>> Initializing analysis engine................done.* > > *>>> Initializing analysis plugins.....done.* > > *>>> Initializing rejection plugins...done.* > > *>>> Initializing log parser... done.* > > * * > > *==== Debug Info ====* > > *Log Format:* > > * * > > *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H > \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s > \"%{$status}e\" %B \"%{}o\" %T* > > * * > > *Using regular expression:* > > * * > > */^(\S*) \[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) > (\S*) (\S*) (\S*) (\s*\S*\s*) (\S*) (\S*) \"(.*?)\" \"(.*?)\" > \"(.*?)\" (\S*) (\S*) (\S*) \"(.*?)\" (\S*) \"(.*?)\" (\S*)$/* > > * * > > *Matching fields (case-sensitive fields present in the log_format > string):* > > * * > > *Array* > > *(* > > * [0] => Connection-Status **ß---- **Seems to be coming from the > %{%Y-%m-%d %H:%M:%S}t Apache Time column formatting… I need a way to > read IIS time formatting* > > * [1] => Date* > > * [2] => Time* > > * [3] => Process-Id* > > * [4] => Server-Name* > > * [5] => Local-IP* > > * [6] => Request-Method* > > * [7] => Request-Path* > > * [8] => Query-String* > > * [9] => Port* > > * [10] => Remote-User* > > * [11] => Remote-IP* > > * [12] => Request-Protocol* > > * [13] => User-Agent* > > * [14] => Cookie* > > * [15] => Referer* > > * [16] => Remote-Host* > > * [17] => Status* > > * [18] => Status* > > * [19] => $status* > > * [20] => Bytes-Sent-X* > > * [21] => Reply-Header* > > * [22] => Time-Taken-S* > > *)* > > * * > > *====* > > * * > > *>>> Closing analysis engine...............done* > > * * > > *D:\Inetpub\wwwroot\ows\scripts>* > > ***************************************************************** > > From what I can see, the "Apache" log variable equivalent of > 2007-07-01 04:00:33 is %{%Y-%m-%d %H:%M:%S}t, and when I use the > upload_log.php I get this for the “Date / Time” columns: > > *[0] => Connection-Status* > > * [1] => Date* > > * [2] => Time* > > I am using ( *%{%Y-%m-%d %H:%M:%S}t* ) because this is the variable > formatting I use in our UNIX environment to produce the ( 2007-07-01 > 04:00:33 ) time formatted output. > > Our httpd.conf files have this exact variable definition for > formatting log output: > > *%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H > \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s > \"%{$status}e\" %B \"%{}o\" %T* > > * * > > I hope I did not ramble or get confusing. I really want to be able to > use this application with my logs. I see a lot of > functionality/expandability and it uses a SQL database for better > reporting capabilities. A big PLUS!!! > > John A. Ridgway > > Systems Analyst II > > College Center for Library Automation > > 850-922-6044 > > jri...@cc... > > -----Original Message----- > From: Michael Papile [mailto:p...@pa...] > Sent: Wednesday, September 19, 2007 8:47 PM > To: Dustin Spicuzza > Cc: John Ridgway; obs...@li... > Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics > > Hi, > > You can pretty easily make your own formatter with no programming: > > in apache_log_formats.php create a line like this: > > define("IIS_W3C",'%h %l %u %t \"%r\" %>s %b \"%{Referrer}i\" > \"%{User-Agent}i\"'); > > Now all of the formatting is not the IIS format (IDK what the IIS > formatting is), but you can make your own. > > Just look at your log output and see what fields are where. > > I made my own formatting for NGINX logs which were not apache like. So > look at your logfield order, and put the %h etc where it appears in > your log. > > Then in your config file put > > $cfg['websites']['domain.com']['log_format'] = IIS_W3C; > > Use the following legend (from apache_log_parser.rb) > > 319 '%' => '', > > 320 'a' => 'Remote-IP', > > 321 'A' => 'Local-IP', > > 322 'B' => 'Bytes-Sent-X', > > 323 'b' => 'Bytes-Sent', > > 324 'c' => 'Connection-Status', // <= 1.3 > > 325 'C' => 'Cookie', // >= 2.0 > > 326 'D' => 'Time-Taken-MS', > > 327 'e' => 'Env-Var', > > 328 'f' => 'Filename', > > 329 'h' => 'Remote-Host', > > 330 'H' => 'Request-Protocol', > > 331 'i' => 'Request-Header', > > 332 'I' => 'Bytes-Recieved', // >= 2.0 > > 333 'l' => 'Remote-Logname', > > 334 'm' => 'Request-Method', > > 335 'n' => 'Note', > > 336 'o' => 'Reply-Header', > > 337 'O' => 'Bytes-Sent', // >= 2.0 > > 338 'p' => 'Port', > > 339 'P' => 'Process-Id', // {format} >= 2.0 > > 340 'q' => 'Query-String', > > 341 'r' => 'Request', > > 342 's' => 'Status', > > 343 't' => 'Time', > > 344 'T' => 'Time-Taken-S', > > 345 'u' => 'Remote-User', > > 346 'U' => 'Request-Path', > > 347 'v' => 'Server-Name', > > 348 'V' => 'Server-Name-X', > > 349 'X' => 'Connection-Status', // >= 2.0 > > 350 ); > > Dustin Spicuzza wrote: > > > I'm not familiar with IIS formatted logfiles, so I'm not sure whats > > > required to make that work. The log parser uses apache CustomLog > > > formatting to parse the logfile, so its conceivable you could use those > > > directives to match the format of the logs. > > > > > > The apache_log_parser.php does all the work, and is called mostly in > > > include/analysis.inc.php .. the parse() function returns an array, which > > > is then fed into the database. I'm not sure what (if any) modification > > > would be needed for this to work with the IIS logs. Maybe you could send > > > a few lines of one? > > > > > > Hope that helps! > > > > > > Dustin > > > > > > John Ridgway wrote: > > > > > >> Hello support, > > >> > > >> I have recently found your PHP application and decided to give > > >> it a try. I know this is still a very new beta project, and I am > > >> looking forward to see it work, but I am currently using W3C Extended > > >> (IIS) formatted logfiles and I realized that your app does not know > > >> how to parse these. > > >> > > >> I began to try to figure out what to change within the files > > >> (apache_log_formats.php and apache_log_parser.php) to get IIS > > >> formatted files to work. Not an easy task.... > > >> > > >> So now I write wondering if there is already something > > >> prepared that will take care of this log formatting syntax? > > >> > > >> Here's hoping!! > > >> > > >> Thanks! > > >> > > >> John A. Ridgway > > >> College Center for Library Automation > > >> Systems Analyst II > > >> jri...@cc... > > >> > > >> ------------------------------------------------------------------------ > > >> > > >> > ------------------------------------------------------------------------- > > >> This SF.net email is sponsored by: Microsoft > > >> Defy all challenges. Microsoft(R) Visual Studio 2005. > > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > >> ------------------------------------------------------------------------ > > >> > > >> _______________________________________________ > > >> Obsessive-compulsive mailing list > > >> Obs...@li... > > >> https://lists.sourceforge.net/lists/listinfo/obsessive-compulsive > > >> > > >> > > > > > > > > > > -- Innovation is just a problem away. |
From: John R. <JRi...@cc...> - 2007-09-20 01:34:23
|
Michael and Dustin, =20 Thank you for the very quick response. I am sorry I did not go into very much detail, but I didn't want to expound on what I had done, if the solution was already at hand. =20 Michael, as I began to investigate how the "Apache" logs are parsed, I did find the apache_log_formats.php file and add what I thought were appropriate entries for both IIS5 and IIS6. =20 define("CCLA_LOG_FORMAT_V5",'%{%Y-%m-%d %H:%M:%S}t %a %u %P %v %A %p %m %U %q %s \"%{$status}e\" %B - %T %H %h \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\"'); =20 define("CCLA_LOG_FORMAT_V6",'%{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s \"%{$status}e\" %B \"%{}o\" %T'); =20 When I did this and then ran the upload_log.php against this log content: =20 #Software: Microsoft Internet Information Services 6.0 #Version: 1.0 #Date: 2007-07-01 04:00:33 #Fields: date time s-sitename s-computername s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken=20 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET /index.asp - 80 - 76.20.156.11 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322)= - http://search.yahoo.com/search?p=3Dflordia+community+colleges&fr=3Dyfp-t= -501 &toggle=3D1&cop=3Dmss&ei=3DUTF-8 www.cclaflorida.org 200 0 0 13102 343= 343 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET /images/cclabak.gif - 80 - 76.20.156.11 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) ASPSESSIONIDAAQATSBD=3DAEIGNNADIJNMJDCFEEHDLNLH http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 296 345 93 2007-07-01 04:00:33 W3SVC94731577 COBRA 192.168.2.105 GET /images/banner_master_01.gif - 80 - 76.20.156.11 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) ASPSESSIONIDAAQATSBD=3DAEIGNNADIJNMJDCFEEHDLNLH http://www.cclaflorida.org/ www.cclaflorida.org 200 0 0 1205 354 78=20 =20 I got this output: =20 ***************************************************************** D:\Inetpub\wwwroot\ows\scripts>php upload_log.php d:\logs\Jul-Sep\ex070701.log claflorida.org debug =20 >>> Initializing analysis engine................done. >>> Initializing analysis plugins.....done. >>> Initializing rejection plugins...done. >>> Initializing log parser... done. =20 =3D=3D=3D=3D Debug Info =3D=3D=3D=3D Log Format: =20 %{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s \"%{$status}e\" %B \"%{}o\" %T =20 Using regular expression: =20 /^(\S*) \[([^:]+):(\d+:\d+:\d+ [^\]]+)\] (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\S*) (\s*\S*\s*) (\S*) (\S*) \"(.*?)\" \"(.*?)\" \"(.*?)\" (\S*) (\S*) (\S*) \"(.*?)\" (\S*) \"(.*?)\" (\S*)$/ =20 Matching fields (case-sensitive fields present in the log_format string): =20 Array ( [0] =3D> Connection-Status <------ Seems to be coming from the %{%Y-%m-%d %H:%M:%S}t Apache Time column formatting... I need a way to read IIS time formatting [1] =3D> Date [2] =3D> Time [3] =3D> Process-Id [4] =3D> Server-Name [5] =3D> Local-IP [6] =3D> Request-Method [7] =3D> Request-Path [8] =3D> Query-String [9] =3D> Port [10] =3D> Remote-User [11] =3D> Remote-IP [12] =3D> Request-Protocol [13] =3D> User-Agent [14] =3D> Cookie [15] =3D> Referer [16] =3D> Remote-Host [17] =3D> Status [18] =3D> Status [19] =3D> $status [20] =3D> Bytes-Sent-X [21] =3D> Reply-Header [22] =3D> Time-Taken-S ) =20 =3D=3D=3D=3D =20 >>> Closing analysis engine...............done =20 D:\Inetpub\wwwroot\ows\scripts> =20 ***************************************************************** =20 >From what I can see, the "Apache" log variable equivalent of 2007-07-01 04:00:33 is %{%Y-%m-%d %H:%M:%S}t, and when I use the upload_log.php I get this for the "Date / Time" columns: =20 [0] =3D> Connection-Status [1] =3D> Date [2] =3D> Time =20 I am using ( %{%Y-%m-%d %H:%M:%S}t ) because this is the variable formatting I use in our UNIX environment to produce the ( 2007-07-01 04:00:33 ) time formatted output. =20 Our httpd.conf files have this exact variable definition for formatting log output: =20 %{%Y-%m-%d %H:%M:%S}t %P %v %A %m %U %q %p %u %a %H \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Referer}i\" %h %s %>s \"%{$status}e\" %B \"%{}o\" %T =20 I hope I did not ramble or get confusing. I really want to be able to use this application with my logs. I see a lot of functionality/expandability and it uses a SQL database for better reporting capabilities. A big PLUS!!! =20 John A. Ridgway Systems Analyst II College Center for Library Automation 850-922-6044 jri...@cc... -----Original Message----- From: Michael Papile [mailto:p...@pa...]=20 Sent: Wednesday, September 19, 2007 8:47 PM To: Dustin Spicuzza Cc: John Ridgway; obs...@li... Subject: Re: [Obsessive-compulsive] Obsessive Web Statistics =20 =20 Hi, You can pretty easily make your own formatter with no programming: =20 in apache_log_formats.php create a line like this: =20 define("IIS_W3C",'%h %l %u %t \"%r\" %>s %b \"%{Referrer}i\" \"%{User-Agent}i\"'); =20 Now all of the formatting is not the IIS format (IDK what the IIS formatting is), but you can make your own. =20 Just look at your log output and see what fields are where. =20 I made my own formatting for NGINX logs which were not apache like. So look at your logfield order, and put the %h etc where it appears in your log. =20 Then in your config file put $cfg['websites']['domain.com']['log_format'] =3D IIS_W3C; =20 Use the following legend (from apache_log_parser.rb) =20 319 '%' =3D> '', 320 'a' =3D> 'Remote-IP', 321 'A' =3D> 'Local-IP', 322 'B' =3D> 'Bytes-Sent-X', 323 'b' =3D> 'Bytes-Sent', 324 'c' =3D> 'Connection-Status', // <=3D 1.3 325 'C' =3D> 'Cookie', // >=3D 2.0 326 'D' =3D> 'Time-Taken-MS', 327 'e' =3D> 'Env-Var', 328 'f' =3D> 'Filename', 329 'h' =3D> 'Remote-Host', 330 'H' =3D> 'Request-Protocol', 331 'i' =3D> 'Request-Header', 332 'I' =3D> 'Bytes-Recieved', // >=3D 2.0 333 'l' =3D> 'Remote-Logname', 334 'm' =3D> 'Request-Method', 335 'n' =3D> 'Note', 336 'o' =3D> 'Reply-Header', 337 'O' =3D> 'Bytes-Sent', // >=3D 2.0 338 'p' =3D> 'Port', 339 'P' =3D> 'Process-Id', // {format} >=3D 2.0 340 'q' =3D> 'Query-String', 341 'r' =3D> 'Request', 342 's' =3D> 'Status', 343 't' =3D> 'Time', 344 'T' =3D> 'Time-Taken-S', 345 'u' =3D> 'Remote-User', 346 'U' =3D> 'Request-Path', 347 'v' =3D> 'Server-Name', 348 'V' =3D> 'Server-Name-X', 349 'X' =3D> 'Connection-Status', // >=3D 2.0 350 ); =20 =20 =20 =20 =20 Dustin Spicuzza wrote: > I'm not familiar with IIS formatted logfiles, so I'm not sure whats=20 > required to make that work. The log parser uses apache CustomLog=20 > formatting to parse the logfile, so its conceivable you could use those=20 > directives to match the format of the logs. >=20 > The apache_log_parser.php does all the work, and is called mostly in=20 > include/analysis.inc.php .. the parse() function returns an array, which=20 > is then fed into the database. I'm not sure what (if any) modification > would be needed for this to work with the IIS logs. Maybe you could send=20 > a few lines of one? >=20 > Hope that helps! >=20 > Dustin >=20 > John Ridgway wrote: > =20 >> Hello support, >> =20 >> I have recently found your PHP application and decided to give=20 >> it a try. I know this is still a very new beta project, and I am=20 >> looking forward to see it work, but I am currently using W3C Extended >> (IIS) formatted logfiles and I realized that your app does not know=20 >> how to parse these. >> =20 >> I began to try to figure out what to change within the files=20 >> (apache_log_formats.php and apache_log_parser.php) to get IIS=20 >> formatted files to work. Not an easy task.... >> =20 >> So now I write wondering if there is already something=20 >> prepared that will take care of this log formatting syntax? >> =20 >> Here's hoping!!=20 >> =20 >> Thanks! >> =20 >> John A. Ridgway >> College Center for Library Automation >> Systems Analyst II >> jri...@cc... >> =20 >> ------------------------------------------------------------------------ >>=20 >> ------------------------------------------------------------------------ - >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2005. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> ------------------------------------------------------------------------ >>=20 >> _______________________________________________ >> Obsessive-compulsive mailing list >> Obs...@li... >> https://lists.sourceforge.net/lists/listinfo/obsessive-compulsive >> =20 >> =20 >=20 >=20 > =20 =20 |
From: Michael P. <p...@pa...> - 2007-09-20 00:46:58
|
Hi, You can pretty easily make your own formatter with no programming: in apache_log_formats.php create a line like this: define("IIS_W3C",'%h %l %u %t \"%r\" %>s %b \"%{Referrer}i\" \"%{User-Agent}i\"'); Now all of the formatting is not the IIS format (IDK what the IIS formatting is), but you can make your own. Just look at your log output and see what fields are where. I made my own formatting for NGINX logs which were not apache like. So look at your logfield order, and put the %h etc where it appears in your log. Then in your config file put $cfg['websites']['domain.com']['log_format'] = IIS_W3C; Use the following legend (from apache_log_parser.rb) 319 '%' => '', 320 'a' => 'Remote-IP', 321 'A' => 'Local-IP', 322 'B' => 'Bytes-Sent-X', 323 'b' => 'Bytes-Sent', 324 'c' => 'Connection-Status', // <= 1.3 325 'C' => 'Cookie', // >= 2.0 326 'D' => 'Time-Taken-MS', 327 'e' => 'Env-Var', 328 'f' => 'Filename', 329 'h' => 'Remote-Host', 330 'H' => 'Request-Protocol', 331 'i' => 'Request-Header', 332 'I' => 'Bytes-Recieved', // >= 2.0 333 'l' => 'Remote-Logname', 334 'm' => 'Request-Method', 335 'n' => 'Note', 336 'o' => 'Reply-Header', 337 'O' => 'Bytes-Sent', // >= 2.0 338 'p' => 'Port', 339 'P' => 'Process-Id', // {format} >= 2.0 340 'q' => 'Query-String', 341 'r' => 'Request', 342 's' => 'Status', 343 't' => 'Time', 344 'T' => 'Time-Taken-S', 345 'u' => 'Remote-User', 346 'U' => 'Request-Path', 347 'v' => 'Server-Name', 348 'V' => 'Server-Name-X', 349 'X' => 'Connection-Status', // >= 2.0 350 ); Dustin Spicuzza wrote: > I'm not familiar with IIS formatted logfiles, so I'm not sure whats > required to make that work. The log parser uses apache CustomLog > formatting to parse the logfile, so its conceivable you could use those > directives to match the format of the logs. > > The apache_log_parser.php does all the work, and is called mostly in > include/analysis.inc.php .. the parse() function returns an array, which > is then fed into the database. I'm not sure what (if any) modification > would be needed for this to work with the IIS logs. Maybe you could send > a few lines of one? > > Hope that helps! > > Dustin > > John Ridgway wrote: > >> Hello support, >> >> I have recently found your PHP application and decided to give >> it a try. I know this is still a very new beta project, and I am >> looking forward to see it work, but I am currently using W3C Extended >> (IIS) formatted logfiles and I realized that your app does not know >> how to parse these. >> >> I began to try to figure out what to change within the files >> (apache_log_formats.php and apache_log_parser.php) to get IIS >> formatted files to work. Not an easy task.... >> >> So now I write wondering if there is already something >> prepared that will take care of this log formatting syntax? >> >> Here's hoping!! >> >> Thanks! >> >> John A. Ridgway >> College Center for Library Automation >> Systems Analyst II >> jri...@cc... >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2005. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Obsessive-compulsive mailing list >> Obs...@li... >> https://lists.sourceforge.net/lists/listinfo/obsessive-compulsive >> >> > > > |
From: Dustin S. <du...@vi...> - 2007-09-20 00:12:49
|
I'm not familiar with IIS formatted logfiles, so I'm not sure whats required to make that work. The log parser uses apache CustomLog formatting to parse the logfile, so its conceivable you could use those directives to match the format of the logs. The apache_log_parser.php does all the work, and is called mostly in include/analysis.inc.php .. the parse() function returns an array, which is then fed into the database. I'm not sure what (if any) modification would be needed for this to work with the IIS logs. Maybe you could send a few lines of one? Hope that helps! Dustin John Ridgway wrote: > Hello support, > > I have recently found your PHP application and decided to give > it a try. I know this is still a very new beta project, and I am > looking forward to see it work, but I am currently using W3C Extended > (IIS) formatted logfiles and I realized that your app does not know > how to parse these. > > I began to try to figure out what to change within the files > (apache_log_formats.php and apache_log_parser.php) to get IIS > formatted files to work. Not an easy task.... > > So now I write wondering if there is already something > prepared that will take care of this log formatting syntax? > > Here's hoping!! > > Thanks! > > John A. Ridgway > College Center for Library Automation > Systems Analyst II > jri...@cc... > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Obsessive-compulsive mailing list > Obs...@li... > https://lists.sourceforge.net/lists/listinfo/obsessive-compulsive > -- Innovation is just a problem away. |
From: John R. <JRi...@cc...> - 2007-09-19 19:23:03
|
Hello support, =20 I have recently found your PHP application and decided to give it a try. I know this is still a very new beta project, and I am looking forward to see it work, but I am currently using W3C Extended (IIS) formatted logfiles and I realized that your app does not know how to parse these. =20 I began to try to figure out what to change within the files (apache_log_formats.php and apache_log_parser.php) to get IIS formatted files to work. Not an easy task.... =20 So now I write wondering if there is already something prepared that will take care of this log formatting syntax? =20 Here's hoping!!=20 =20 Thanks! =20 John A. Ridgway College Center for Library Automation Systems Analyst II jri...@cc... =20 |
From: Dustin S. <du...@vi...> - 2007-09-01 06:08:29
|
Hey, I just wrote an analysis plugin tutorial, which shows how you can add to a dimension and extend it -- in this case, resolving the hostname of the visitor. Hope this helps you in writing your own plugins, check it out at my blog. http://www.virtualroadside.com/blog/index.php/2007/09/01/obsessive-web-statistics-ows-analysis-plugin-tutorial/ Dustin -- Innovation is just a problem away. |
From: Dustin S. <du...@vi...> - 2007-08-26 17:36:22
|
Hey, If you do not have any iRejectPlugin plugins installed, then 0.8.0.2 will reject all lines. This has been fixed in 0.8.0.3. Sorry for the inconvenience. Dustin -- Check out my car computer! http://www.virtualroadside.com/carputer/ |
From: Dustin S. <du...@vi...> - 2007-08-25 07:11:46
|
Hey All, This is a minor release of OWS. It is mostly a bugfix release, with the exception of the addition of the 'reject' plugin type, which can be used to determine whether or not a logfile line is added to the database. This is used to (optionally) purge the database of old data, past a specified limit. Unfortunately, it does NOT archive the old data yet, but that will eventually be added. I've just been rather busy lately. :) Dustin -- Check out my car computer! http://www.virtualroadside.com/carputer/ |
From: Dustin S. <du...@vi...> - 2007-08-15 01:21:06
|
Hey, There was a huge issue with the ows_aggregate plugin in version v0.8.. sorting just did not work at all. v0.8.0.1 has been released to resolve this issue. Thanks to Jon for pointing this out. Dustin -- Check out my car computer! http://www.virtualroadside.com/carputer/ |