An earlier version of the client submitted arbitrary
binary "characters" as the User-Agent string, sometimes
followed by a "http". Eventually, this got changed to
"Mozilla/4.0 (compatible; grub-client-0.2.3; blah...".
However, I see an increasing number of entries in my
logfiles that are shortened to "Mozi!". This is
probably the same old bug that now inserts an
exclamation mark and a null byte (and possbily more
garbage) into the internal UA string. I found this
confirmed today by a log entry that said "Mozi^Y". Note
that the <ctrl-Y> (and any other control characters in
the UA) results in an invalid HTTP request, although
most servers seem to be fairly tolerant about it.
Watch your pointers, folks!
Bogus UA strings will get you permanently banned from
many sites by webmasters who care for what happens to
their stuff. You don't want that to happen, do you?
-schorsch
Logged In: YES
user_id=37362
Although there *was* a bug in an earlier version of the
client that affected the user agent field, the current
version does not, to our knowledge, contain this same
flaw. If you look at the Crawler.h file on CVS, you will
see that in revision 1.6, we added DEFINEs to contain the
client UA information. The Crawler.cpp file takes these
defines, as well as the version define and passes them to
the cURL libraries without passing them into variables.
Defines are substitutions during compile time, so are the
same as typing "somestring" in your code. There simply
aren't any pointers to keep track of with this logic, so
it is *highly* unlikely that we are doing anything wrong
with this code.
That said, it is still possible that cURL is doing
something weird, but I find that even more unlikely as
cURL has had VERY active development on it in the past
year, and someone would have caught this by now.
Another matter to consider is that another part of the code
is overwriting the string in memory, but because these are
defines, the program should crash when it (incorrectly)
tries to write to this *protected* memory area. Just try
to write to a const char * variable and you will see an
example of this in action.
All this said, I am dubious that your "Mozi!" string is
actually one of our crawlers in action. It is possible
that someone is still running an older version of the
client which would cause this, but everyone has been
notified of the new release and there have been two
releases since this bug was active.
I did a search on Google and Altavista for your rouge
string and came up with a couple of pages that log UA
coming into their site and rank their visit percentages.
Several pages contained a high occurrence of "Mozi!" (like
1%), yet had no trace of the actual grub-client UA string.
Others had both, and a few had only the correct grub-client
UA string. All this leads me to think that there are other
crawlers/browsers out there with bugs in them that truncate
the correct UA, and that it isn't necessarily us doing it.
Kord