Menu

#88 re-encoding of the "%" is wrong

v1.2
open
nobody
None
5
2012-09-13
2011-03-23
Ralf Krenft
No

On our website, we use coded URLs for hyperlinks.

example:

http://www.mydomain.de/path/?tx_list_pi1%5Buid%5D=599&tx_list_pi1%5Bmode%5D=6&cHash=25c1c23772bc235e91fc9af6b0f782a9

The OSS crawler find these hyperlink and save it for crawling in the same format.

http://www.mydomain.de/path/?tx_list_pi1%5Buid%5D=599&tx_list_pi1%5Bmode%5D=6&cHash=25c1c23772bc235e91fc9af6b0f782a9

When the OSS crawler retrieves a stored hyperlink, he first makes an URL encoding.
In our case this means that the OSS-Crawler will make a %5B (an already encoded "[") to %255B which make no sence for our system.

We found that line in the nginx log:

10.10.10.2 - - [16/Mar/2011:08:29:14 +0100] "GET /path/?tx_list_pi1%255Buid%255D=599&tx_list_pi1%255Bmode%255D=6&cHash=25c1c23772bc235e91fc9af6b0f782a9 HTTP/1.1" 200 21177 "-" "OSS_Bot"

In my opinion the re-encoding of the "%" is wrong.

best regards

Discussion


Log in to post a comment.