If awestats could use cookies to track visitors, thus, making
the system highly accurate concerning people coming from
business environs and AOL - like platforms, this would be the
program for us.
As it stands, we must pay for some inferior product at my
work so that we can have cookie tracking.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
you all are missing the point of this project, it is a log
analyzer - that means it reads the server log files.It
doesn't collect the information itself. The programs that do
this (like SuperStats) also effect your site's performance
while users are visiting because of the extra bandwidth
needed to collect the required information.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Actually, apache will happily provide the cookie information
in the log file... allowing for awestats to look at the
cookies without any performance issues.
This is simple to set up (uncomment a module in apache and
add a field to the log format), and wouldn't be much
different than determining the users based on IP address.
The real advantage of such a feature is that it allows for
mass ip based groups (like aol users, some dialup users,
etc) to be counted for who they are, rather than one massive
user.
Best wishes, and thanks for the great software.
</kato>
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This option should be provided by Awstats itself. I am
submitting to awstst.pl the logs retrieved from our IIS web
server with the all the field checked (Fields: date time c-ip cs-
username s-sitename s-computername s-ip s-port cs-method
cs-uri-stem cs-uri-query sc-status sc-win32-status sc-bytes
cs-bytes time-taken cs-version cs-host cs(User-Agent) cs
(Cookie) cs(Referer) ).
Since this is a a personalized format, the fields s-sitename, s-
port, cs-bytes, time-taken, cs-host, cs(User-Agent), cs
(Cookie) are not considered in the section "# Personalized
log format" of "# GENERATING PerlParsingFormatmat".
After all how the can Awstat generate report by counting
visitors based on cookie field ?
Thanks,
Osvaldo
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In this exemple we use the PHP auto cookie to follow the session, and a home page generated "idvisiteur" cookies (which live many days) for the user/visitor tracking.
Notes: any visitor non-cooked page are ignored for the visitor and session tracking.
We're now testing this patch on our production servers. I think there is more works to do and some limitations :
the visitor cookie is mandatory, else awstats would count too much session changes from proxies (AOL, ...).
when matching the creation of a cookie by the web serveur, apache only log the first Set-Cookie (fortunaltaly PHPSESSID is the first in our configuration), and so the first page may be badly tracked.
If a browse refuse (visitor) cookies, it won't be count.
if ($IncludeUAInVisitors && $Debug) { debug(" Include User Agent in Visistor ID.",1); }
if ($VisitAllHosts && $Debug) { debug(" Visit all hosts.",1); }
if ($VisitorCookie && $Debug) { debug(" Use cookie '$VisitorCookie' as visitor identity.",1); }
if ($SessionCookie && $Debug) { debug(" Use cookie '$SessionCookie' as session identity.",1); }
@@ -6898,15 +6914,59 @@
if ($VisitAllHosts) {
if ($Debug) { debug(" This is a second visit for $VisitorId.",4); }
}
+
+ # If we use any cookie get all of them from the log
+ if ($VisitorCookie || $SessionCookie) {
+ $client_cookies = $field[$pos_extra[1]];
+ $server_cookies = $field[$pos_extra[2]];
+ if ($Debug) {
+ debug("Client cookies = $client_cookies", 3);
+ debug("Server cookies = $server_cookies", 3);
+ }
+ }
if ($PageBool || $VisitAllHosts) {
If we use a visitor cookie, add this cookie to the visitor id to separate 'real user'
if ($VisitorCookie) {
Extract the visitor cookies from the client or server cookies
if ($server_cookies =~ /$VisitorCookie=([^;]*)(;|$)/) {
We count this if the Page are accepted (good extention or VisitAllHosts on),
but if we use visitor cookie we exclude non cooked pages
if (($PageBool || $VisitAllHosts) && (!$VisitorCookie || $VCookie)) {
my $timehostl=$_host_l{$VisitorId};
if ($timehostl) {
# A visit for this host was already detected
# TODO everywhere there is $VISITTIMEOUT
# $timehostl =~ /^\d\d\d\d\d\d(\d\d)/; my $daytimehostl=$1;
# if ($timerecord > ($timehostl+$VISITTIMEOUT+($dateparts[3]>$daytimehostl?$NEWDAYVISITTIMEOUT:0))) {
if ($timerecord > ($timehostl+$VISITTIMEOUT)) {
A new session started if we reach the VISITTIMEOUT delay
or if the session cookie changed for this visitor
However blank session pass with success to avoid first (unique) visit on the web site
if ($timerecord > ($timehostl+$VISITTIMEOUT) || ($_host_sid{$VisitorId} && $_host_sid{$VisitorId} != $SCookie)) {
# This is a second visit or more
if (! $_waithost_s{$VisitorId}) {
# This is a second visit or more
@@ -6969,6 +7029,7 @@
else {
# This is a new visit (may be). First new visit found for this host. We save in wait array the entry page to count later
if ($Debug) { debug(" New session (may be) for $VisitorId. Save in wait array to see later",4); }
If we use a cookie for the session, we revord rhe cookie value insted of the time
$_waithost_e{$VisitorId}=$field[$pos_url]; # Save new session properties $_host_u{$VisitorId}=$field[$pos_url];
Logged In: YES
user_id=169865
If awestats could use cookies to track visitors, thus, making
the system highly accurate concerning people coming from
business environs and AOL - like platforms, this would be the
program for us.
As it stands, we must pay for some inferior product at my
work so that we can have cookie tracking.
Logged In: YES
user_id=466127
you all are missing the point of this project, it is a log
analyzer - that means it reads the server log files.It
doesn't collect the information itself. The programs that do
this (like SuperStats) also effect your site's performance
while users are visiting because of the extra bandwidth
needed to collect the required information.
Logged In: NO
Actually, apache will happily provide the cookie information
in the log file... allowing for awestats to look at the
cookies without any performance issues.
This is simple to set up (uncomment a module in apache and
add a field to the log format), and wouldn't be much
different than determining the users based on IP address.
The real advantage of such a feature is that it allows for
mass ip based groups (like aol users, some dialup users,
etc) to be counted for who they are, rather than one massive
user.
Best wishes, and thanks for the great software.
</kato>
Logged In: NO
This option should be provided by Awstats itself. I am
submitting to awstst.pl the logs retrieved from our IIS web
server with the all the field checked (Fields: date time c-ip cs-
username s-sitename s-computername s-ip s-port cs-method
cs-uri-stem cs-uri-query sc-status sc-win32-status sc-bytes
cs-bytes time-taken cs-version cs-host cs(User-Agent) cs
(Cookie) cs(Referer) ).
Since this is a a personalized format, the fields s-sitename, s-
port, cs-bytes, time-taken, cs-host, cs(User-Agent), cs
(Cookie) are not considered in the section "# Personalized
log format" of "# GENERATING PerlParsingFormatmat".
After all how the can Awstat generate report by counting
visitors based on cookie field ?
Thanks,
Osvaldo
Logged In: YES
user_id=326580
how is it to count and follow visitors based an the
SESSIONID in the url ? or any other user-defined string in
the url
or generaly with just regexp over the logline? so the user
can easily define by itself how to identify unique visitors
Logged In: YES
user_id=1553726
Originator: NO
Hi,
Since my previous patch, that use IP+USerAgent as the visitor key, I've worked on a new one that use cookies to track visitors and sessions.
To use this pactch, add the folowing lines in the .config file :
LogFormat = "%host %other %logname %time1 %methodurl %code %bytesd %refererquot %uaquot %extraquot1 %extraquot2"
VisitorCookie = "idvisiteur"
SessionCookie = "PHPSESSID"
where extraquot1 and extraquot2 match the web server cookie fields. In my case I use the following format in our httpd.conf apache file :
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Set-Cookie}o\"" combined_cookie
CustomLog /var/log/apache/access.log combined_cookie
In this exemple we use the PHP auto cookie to follow the session, and a home page generated "idvisiteur" cookies (which live many days) for the user/visitor tracking.
Notes: any visitor non-cooked page are ignored for the visitor and session tracking.
We're now testing this patch on our production servers. I think there is more works to do and some limitations :
The patch :
--- awstats.pl.bak_before_cookies 2006-12-01 13:06:45.000000000 +0100
+++ awstats.pl 2006-12-01 13:06:56.000000000 +0100
@@ -61,6 +61,7 @@
$pos_vh $pos_host $pos_logname $pos_date $pos_tz $pos_method $pos_url $pos_code $pos_size
$pos_referer $pos_agent $pos_query $pos_gzipin $pos_gzipout $pos_compratio $pos_timetaken
$pos_cluster $pos_emails $pos_emailr $pos_hostr @pos_extra
+$pos_client_cookie $pos_server_cookie $client_cookies $server_cookies
/;
$DIR=$PROG=$Extension='';
$Debug = $ShowSteps = 0;
@@ -145,6 +146,10 @@
$DecodeUA
$IncludeUAInVisitors
$VisitAllHosts
+$VisitorCookie
+$SessionCookie
+$VCookie
+$SCookie
/;
($DebugMessages, $AllowToUpdateStatsFromBrowser, $EnableLockForUpdate, $DNSLookup, $AllowAccessFromWebToAuthenticatedUsersOnly,
$BarHeight, $BarWidth, $CreateDirDataIfNotExists, $KeepBackupOfHistoricFiles,
@@ -156,8 +161,8 @@
$IncludeInternalLinksInOriginSection,
$AuthenticatedUsersNotCaseSensitive,
$Expires, $UpdateStats, $MigrateStats, $URLNotCaseSensitive, $URLWithQuery, $URLReferrerWithQuery,
-$DecodeUA, $IncludeUAInVisitors, $VisitAllHosts)=
-(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0);
+$DecodeUA, $IncludeUAInVisitors, $VisitAllHosts, $VisitorCookie, $SessionCookie, $VCookie, $SCookie)=
+(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,'','');
use vars qw/
$DetailedReportsOnNewWindows
$FirstDayOfWeek $KeyWordsNotSensitive $SaveDatabaseFilesWithPermissionsForEveryone
@@ -322,6 +327,7 @@
%_domener_p %_domener_h %_domener_k %_errors_h %_errors_k
%_filetypes_h %_filetypes_k %_filetypes_gz_in %_filetypes_gz_out
%_host_p %_host_h %_host_k %_host_l %_host_s %_host_u
+%_host_sid
%_waithost_e %_waithost_l %_waithost_s %_waithost_u
%_keyphrases %_keywords %_os_h %_pagesrefs_p %_pagesrefs_h %_robot_h %_robot_k %_robot_l %_robot_r
%_worm_h %_worm_k %_worm_l %_login_h %_login_p %_login_k %_login_l %_screensize_h
@@ -5036,6 +5042,7 @@
$pos_vh = $pos_host = $pos_logname = $pos_date = $pos_tz = $pos_method = $pos_url = $pos_code = $pos_size = -1;
$pos_referer = $pos_agent = $pos_query = $pos_gzipin = $pos_gzipout = $pos_compratio = -1;
$pos_cluster = $pos_emails = $pos_emailr = $pos_hostr = -1;
+ $pos_client_cookie = $pos_server_cookie = -1;
@pos_extra=();
@fieldlib=();
$PerlParsingFormat='';
@@ -5050,10 +5057,10 @@
# WebStar: 05/21/00 00:17:31 OK 200 212.242.30.6 Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt) http://www.cover.dk/ "www.cover.dk" :Documentation:graphics:starninelogo.white.gif 1133
# Squid extended: 12.229.91.170 - - [27/Jun/2002:03:30:50 -0700] "GET http://www.callistocms.com/images/printable.gif HTTP/1.1" 304 354 "-" "Mozilla/5.0 Galeon/1.0.3 (X11; Linux i686; U;) Gecko/0" TCP_REFRESH_HIT:DIRECT
if ($Debug) { debug("Call To DefinePerlParsingFormat (LogType='$LogType', LogFormat='$LogFormat')"); }
- if ($LogFormat =~ /^[1-6]$/) { # Pre-defined log format
+ if ($LogFormat =~ /^[1-6]$/) { # Pre-defined log format ;
if ($LogFormat eq '1' || $LogFormat eq '6') { # Same than "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"".
# %u (user) is "([^\[]+)" instead of "[^ ]+" because can contain space (Lotus Notes). referer and ua might be "".
-# $PerlParsingFormat="([^ ]+) [^ ]+ ([^\[]+) \[([^ ]+) [^ ]+\] \\"([^ ]+) (.+) [^\\"]+\\" ([\d|-]+) ([\d|-]+) \\"(.?)\\" \\"([^\\"])\\"";
+# $PerlParsingFormat="([^ ]+) [^ ]+ ([^\[]+) \[([^ ]+) [^ ]+\] \\"([^ ]+) (.+) [^\\"]+\\" ([\d|-]+) ([\d|-]+) \\"(.?)\\" \\"([^\\"])\\" ";
$PerlParsingFormat="([^ ]+) [^ ]+ ([^\[]+) \[([^ ]+) [^ ]+\] \\"([^ ]+) ([^ ]+) [^\\"]+\\" ([\d|-]+) ([\d|-]+) \\"(.?)\\" \\"([^\\"])\\"";
$pos_host=0;$pos_logname=1;$pos_date=2;$pos_method=3;$pos_url=4;$pos_code=5;$pos_size=6;$pos_referer=7;$pos_agent=8;
@fieldlib=('host','logname','date','method','url','code','size','referer','ua');
@@ -5273,6 +5280,11 @@
$pos_extra[$1] = $i; $i++; push @fieldlib, "extra$1";
$PerlParsingFormat .= "([^$LogSeparatorWithoutStar]+)";
}
+ # Extra value between " (e.g. cookies string)
+ elsif ($f =~ /%extraquot(\d+)$/) {
+ $pos_extra[$1] = $i; $i++; push @fieldlib, "extra$1";
+ $PerlParsingFormat .= "\\"([^\\"]*)\\"";
+ }
# Other tag
elsif ($f =~ /%other$/) {
$PerlParsingFormat .= "[^$LogSeparatorWithoutStar]+";
@@ -6170,6 +6182,8 @@
if ($SessionCookie && $Debug) { debug(" Use cookie '$SessionCookie' as session identity.",1); }
if ($EnableLockForUpdate) {
# Trap signals to remove lock
@@ -6886,8 +6900,10 @@
my $UA;
my $VisitorId;
$UA = $UserAgent;
- $UA =~ s/ //g;
-
+ $UA =~ s/ //g;
+ $VCookie = "";
+ $SCookie = "";
+
$VisitorId = $HostResolved;
@@ -6898,15 +6914,59 @@
if ($VisitAllHosts) {
if ($Debug) { debug(" This is a second visit for $VisitorId.",4); }
}
+
+ # If we use any cookie get all of them from the log
+ if ($VisitorCookie || $SessionCookie) {
+ $client_cookies = $field[$pos_extra[1]];
+ $server_cookies = $field[$pos_extra[2]];
+ if ($Debug) {
+ debug("Client cookies = $client_cookies", 3);
+ debug("Server cookies = $server_cookies", 3);
+ }
+ }
if ($PageBool || $VisitAllHosts) {
If we use a visitor cookie, add this cookie to the visitor id to separate 'real user'
Extract the visitor cookies from the client or server cookies
+
+
If we use a session cookie, extract it now
Extract the session cookies from the client or server cookies
+
We count this if the Page are accepted (good extention or VisitAllHosts on),
but if we use visitor cookie we exclude non cooked pages
my $timehostl=$_host_l{$VisitorId};
if ($timehostl) {
# A visit for this host was already detected
# TODO everywhere there is $VISITTIMEOUT
# $timehostl =~ /^\d\d\d\d\d\d(\d\d)/; my $daytimehostl=$1;
# if ($timerecord > ($timehostl+$VISITTIMEOUT+($dateparts[3]>$daytimehostl?$NEWDAYVISITTIMEOUT:0))) {
A new session started if we reach the VISITTIMEOUT delay
or if the session cookie changed for this visitor
However blank session pass with success to avoid first (unique) visit on the web site
# This is a second visit or more
if (! $_waithost_s{$VisitorId}) {
# This is a second visit or more
@@ -6969,6 +7029,7 @@
else {
# This is a new visit (may be). First new visit found for this host. We save in wait array the entry page to count later
if ($Debug) { debug(" New session (may be) for $VisitorId. Save in wait array to see later",4); }
If we use a cookie for the session, we revord rhe cookie value insted of the time
@@ -6979,6 +7040,7 @@
}
$_host_h{$VisitorId}++;
$_host_k{$VisitorId}+=int($field[$pos_size]);
$_host_sid{$VisitorId} = $SCookie;