#206 Allow to define referer with full URL instead of server only

open
nobody
None
5
2012-10-11
2003-02-19
No

Referer URL that requires '/search' in
search_engines.pm does not work in version 5.4

Example log record:
100.100.100.100 - - [22/Jan/2003:06:53:43 -
0600]
"GET /pcatalog.htm HTTP/1.0" 200
33526 "http://vivisimo.com/search?query=%
27IT+Planning%27&v%3Asources=AltaVista%2CMSN%
2CLooksmart%2CNetscape%
2CLycos&x=49&y=15" "Mozilla/4.0 (compatible; MSIE
6.0; Windows NT 5.0)"

search_engine.pm entry:
"vivisimo.com/search"

This entry does not tally as a search engine record but
as a Pages-URL item. If I remove the /search it will
show up as a search entry. The problem is if a non-
search entry (standard link) is found from the same
rererer (vivisimo.com) it will be counted as a search.

Non-Search Entry example:
aa.aa.com - - [25/Jan/2003:08:12:00 -
0600]
"GET /testfile.htm HTTP/1.0" 200
2075 "http://vivisimo.com/xxx.html" "Mozilla/4.0
(compatible; MSIE 5.1; Windows 98)"

The existing entry "bbc.co.uk/cgi-bin/search" can be
used as a test as it does not work either.

Thanks

Discussion

  • Laurent Destailleur (Eldy)

    Logged In: YES
    user_id=96898

    Terms should be Perl compliant so try with
    vivisimo.com\/search
    instead of
    vivisimo.com/search
    and tell me if it works (it should).

     
  • Andrew Weiner

    Andrew Weiner - 2003-02-19

    Logged In: YES
    user_id=713707

    I have tried that and both are processed as pages. I would
    suggest looking at awstats.pl line 5629.

    $field[$pos_referer] =~ /^(\w+):\/\/([^\/]+)/;
    my $refererprot=$1;
    my $refererserver=$2;

    Are you setting $refererserver to the query string up to the
    domain? If so, you will exclude the remainder of the field.

    Does this make sense?

    Listing from awstats:
    - Full list
    - http://vivisimo.com/xxx.html 1 1
    - http://vivisimo.com/search?query=%27IT+Planning%27&v%
    3Asources=AltaVista... 1 1

     
  • Andrew Weiner

    Andrew Weiner - 2003-09-10

    Logged In: YES
    user_id=713707

    I believe I have a solution to this issue.

    Lets use the following example:
    Referer:
    http://www.bbc.co.uk/cgi-bin/search/results.pl?
    q=steps+to+project+management&tab=www

    search_engines.pm entry:
    "bbc.co.uk\/cgi-bin\/search"

    The test is only being performed on the base url:
    $refererserver = www.bbc.co.uk

    In this case when examined it will fail:
    if ($refererserver =~ /$key/i) {
    www.bbc.co.uk =~ bbc.co.uk/cgi-bin/search

    If we use $field[$pos_referer] the resulut is true:
    if($field[$pos_referer] =~ /$key/i) {
    http://www.bbc.co.uk/cgi-bin/search/results.pl?(truncated)
    =~ bbc.co.uk/cgi-bin/search

    Note: The search key does need to be escaped with a
    backslash 'xxx\/yyy\/zzz'.

    I suggest that you consider making this change.

    Thanks as always,
    Andrew

    AREA OF CODE DISCUSSED

    Analyze: Referer

    $field[$pos_referer] =~ /^(\w+):\/\/([^\/:]+)(:\d+|)/;
    my $refererprot=$1;
    my $refererserver=$2.($3 eq ':80'?'':$3);

    refererserver is www.xxx.com or www.xxx.com:81 but not

    www.xxx.com:80

    if (! $found) {
    # Extern (This hit came from an external web site).
    if ($LevelForSearchEnginesDetection) {
    foreach my $key (@SearchEnginesSearchIDOrder) {
    # Search ID in order of SearchEnginesSearchIDOrder

    LINE TO CHANGE IS BELOW

      if ($refererserver =~ /$key/i) {
    
    # This hit came from the search engine $key
    if ($Debug) { debug("  Server '$refererserver' is
    

    added to TmpRefererServer with value '$key'",2); }
    $TmpRefererServer{$refererserver}="$key";
    $found=1;
    last;
    }
    }
    }
    }
    }

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks