On Tue, 21 Jan 2003, Hiroki WAKABAYASHI wrote:
> Okay, here is an example page http://biztech.nikkeibp.co.jp (tech info
> page from nikkei). The proxy "does filter" the page, but the text
> filtered by proxy gets screwed up under iso-2022-jp .
Hmmm:
GET http://biztech.nikkeibp.co.jp/ HTTP/1.1
HTTP/1.1 200 OK
Server: Netscape-Enterprise/3.6 SP3
Date: Sun, 26 Jan 2003 11:03:26 GMT
Content-type: text/html
Connection: close
Age: 2
Via: HTTP/1.1 RabbIT
The server says that this is normal text/html, no encodding. So
treating the page like ISO8859-1 (latin-1) seems ok from the proxys
point of view.
This also means that your extra filters will not handle this page.
One working solution is to add the dontfilterfilter and add this
page, the other solution is to educate nikkei to use correct
encodings in there responses.
somehting like this:
[Filters]
httpinfilters=rabbit.filter.HTTPBaseFilter,rabbit.filter.DontFilterFilter
[rabbit.filter.DontFilterFilter]
dontFilterURLmatching=biztech.nikkeibp.co.jp
/robo
|