Virilo Tejedor wrote:
> After several attempts, finally Google bot visits all my site (also
>dynamics pages).
>
> The problem is that Google have indexed something like:
>mysite.com/article.php?article_id=111&Mi_Session=843e8bd410a726f15f63d0dfcc7
>da532
>
> Im using phplib 7.4 with 'session4.inc' session class. I have noticed
>that there is not block_alien_sid flag like in 'session.inc'. Then all
>visitors linked by Google, are using the same session.
>
>
Since session4.inc uses PHP's built-in session handling, the problem is
with that, not with PHPlib. You'll want to take a good look at PHP's
documentation on it: http://us2.php.net/manual/en/ref.session.php
In your case, as long as your sessions expire within a reasonable amount
of time (i.e. a few hours at most), it shouldn't be a widespread
problem. PHP will simply discard the session in the URL if it has
expired and create a new one.
> I have thought in blocking alien sessions, clearing this string from
>URL. But I cant manage a list with forbidden session ids, because there is
>many bots, and they use a new session each time.
>
> One possible solution could be "ip-blocking". I have readed that this
>isnt the best solution for session hijacking, due to the proxies, but can
>solve my problem with Google.
>
> There is a better solution? Or any implementation for ip-bloking?
>
>
There are a couple of possibilities. (Be aware, I have no experience
using these settings; you might want to ask one of the general PHP lists
about this for a firsthand account.) First, if your site doesn't require
sessions to work properly, or you're willing to limit sessions to only
clients that support cookies, you can fix this problem by setting the
session.use_only_cookies setting to true, which disables the Session ID
in the URL. This is probably one of the better solutions, as most
clients that have cookies off are likely aware of the issues involved in
not accepting cookies. However, if googlebot (which doesn't care about
cookies, AFAIK) needs to have it's own session in order to properly
index your site, this will cause problems. (However, one might argue
that you have a flawed design if that is the case.)
Another possibility would be to use session.referer_check, set to your
website address. However, this would likely keep sessions from working
on clients that either set an empty string for their referrer or that
spoof it for privacy reasons. I don't know if googlebot is such a
client, so again, this may cause problems if googlebot must have a
session to index your site.
You could disable or destroy sessions when the user agent looks like
googlebot or some other search bot (via their IP or useragent string,
perhaps).
Another possibility is to store the user's IP address with the session
when it is first created (is that what you meant by 'IP blocking'?),
then make sure that IP address matches each time the session is called
back up. However, this can cause problems if you can't depend on the
user of the site to maintain a single IP address for the duration of the
session (not uncommon with large ISPs that use proxies, such as AOL).
This can be mitigated to some degree by only matching the first 2 or 3
octets of the address.
A related topic is also discussed at this thread on the PHP-General
mailing list:
http://marc.theaimsgroup.com/?t=102722998300003&r=1&w=2
Again, I'd recommend posing the question to one of the PHP mailing lists
for more specific answers.
Hope this helps.
--
___________________________
Nathaniel Price
http://www.tesserportal.net
Webmaster
|