Re: [Rabbit-proxy-development] 060403: Pre-3.1 imressions & ideas for the future releases

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Matej Miheli=E8 wrote:
> I have tested the pre-3.1 version and it did fix most of my problems. I=
t=20
> fixed all the reported ones.

Glad to hear that.

> There is one more problem regarding Firefox and incomplete pages in=20
> connection with time-outs. It looks like sometimes the connection will=20
> simply hang and then an incomplete page is returned and cached by the=20
> Firefox. I don't exactly know what is happening but I've not seen such=20
> behaviour with any other proxy and I have a few experiences.

Hmmm, I have not seen this. Do you have a web site where it usually
happens?

> There is a condition that you may want to handle gracefully. Some=20
> webmasters use and send "-1" as a value for "Expires" header. I know=20
> this is a violation of the RFC but it is common. You should handle it=20
> gracefully. It means don't cache or already expired.

Maybe, but note that there are _very_ many different expires that
violate the spec. I could try to make rabbit be a bit nicer with -1 and
already expired entries when it is run in non-strict mode.

> If you would implement the ad blocking and URL blocking in a way that=20
> would enable user to use a pre-prepared block lists you would really=20
> help them. The most popular ad-block list would be the one for the=20
> ADBlock Firefox extension - G-Filter sets (=20
> http://www.pierceive.com/filtersetg/ ). They are formed in two separate=
=20
> lists: black list and white list. By implementing the import from them=20
> in the Rabbit you would really help the users.

This ought to be simple enough. Rabbit currently uses one regexp and=20
theese lists seems to be a set of regexps.

>    * Enhance "rabbit.filter.BlockFilter" to blocks HTTPS URLs as well.
>      I have explained in a previous message why I find this important.

This could be easy or hard, depending on what you mean.
Blocking the CONNECT request is trivial, but what happens on an tunneled=20
and encrypted connection is not something rabbit can filter.

>    * When possible embed external files into the HTML using the
>      RFC-2397 data URI scheme=20

Ugh, Sounds like this will take some time, if I am to do it. You do not=20
have a patch ready? ;-).

>      I know that data URI is in general limited to 1024-4096 bytes

Rabbit currently has a 4k buffer so that is the current maximum uri for=20
rabbit. I plan to make that growable as needed in the future. I believe=20
that it ought to stay limited though.

>        * I would also replace HREFs that are pointing to
>          (adfiler/blockfilter) blocked URLs with one of the following o=
nes:
>              o HREF to a fixed error page at the RABBIT server. This
>                would allow for caching of the response.
>              o HREF=3Ddata:,Rabbit%20denied%20this%20page

What do you think that this does:
adreplacer=3Dhttp://$proxy/FileSender/public/NoAd.gif

So for ad-filtering this already works. Adding blocked site replacement=20
could be tricky, due to me wanting to have the real site name in the=20
"this page '<page url>' is blocked by rabbit configuration..."

> Then there is one last proposal. You could implement SSL filtering as=20
> well. Proxomitron is a great example how it could be done. It users=20
> temporary SSL key between client and proxy and temporary or predefined=20
> SSL certificates when communicating with remote servers.

Maybe, again, this is probably something that will take time. Rabbit is=20
a spare time project so patches are very welcome.

Many nice ideas, I like it.

Thanks
/robo