Thread: [Rabbit-proxy-development] 060403: Pre-3.1 imressions & ideas for the future releases
Brought to you by:
ernimril
From: <mat...@ne...> - 2006-04-03 14:07:22
|
* 060403: Pre-3.1 impressions & ideas for the future releases I have tested the pre-3.1 version and it did fix most of my problems. It fixed all the reported ones. There is one more problem regarding Firefox and incomplete pages in connection with time-outs. It looks like sometimes the connection will simply hang and then an incomplete page is returned and cached by the Firefox. I don't exactly know what is happening but I've not seen such behaviour with any other proxy and I have a few experiences. There is a condition that you may want to handle gracefully. Some webmasters use and send "-1" as a value for "Expires" header. I know this is a violation of the RFC but it is common. You should handle it gracefully. It means don't cache or already expired. As for my suggestions for the future I would like to suggest the following enhancements that would improve user experience in the following two regions: * Ads are really a bandwidth hogs and are annoying. * The proliferation of satellite and mobile (3G GSM - EDGE, UMTS) connections with very high latency (500ms+) requires a minimisation of requests sent from client to server. If you would implement the ad blocking and URL blocking in a way that would enable user to use a pre-prepared block lists you would really help them. The most popular ad-block list would be the one for the ADBlock Firefox extension - G-Filter sets ( http://www.pierceive.com/filtersetg/ ). They are formed in two separate lists: black list and white list. By implementing the import from them in the Rabbit you would really help the users. My suggestions for new ad-blocking features would be the following: * Implement white-list and black-list definitions for the "rabbit.filter.BlockFilter" as well as for the "rabbit.filter.AdFilter" ("blockURLmatching" and "DontBlockURLmatching"). This would allow for more a more relaxed filtering with optional white-listing of certain sites. * Besides current format for filters allow for read patterns from file in the G-Filter lists format ("blockURLmatchingFile" and "DontBlockURLmatchingFile"). You should probably convert the patterns to common format and merge them with the ones from "blockURLmatching"/"DontBlockURLmatching". * Enhance "rabbit.filter.BlockFilter" to blocks HTTPS URLs as well. I have explained in a previous message why I find this important. My suggestions for high-latency links accelerations are the following: * When possible embed external files into the HTML using the RFC-2397 data URI scheme (IMG tag, SCRIPT, STYLE tag - you fetch the file from SRC/HREF and replace it) . References: ( http://en.wikipedia.org/wiki/Data:_URL, http://www.mozilla.org/quality/networking/docs/aboutdata.html ). I know that this is currently only supported through Mozilla and Opera browsers but it would probably help tremendously on high-latency link. There is a way to get partial RFC-2397 support in the IE through protocol handler but it will be limited by the URL connection limit in IE. I've put you a copy of the IE plugin on my server "http://neosys.si/users/Matej/DataProtocol.zip". Examples: - http://neosys.si/users/matej/rabbit/Data_SiOL.net.htm (Opera and Netscape) - http://neosys.si/users/matej/rabbit/Data_IE_SiOL.net.htm (IE - doesn't support data URIs GIF?!?) - http://www.scalora.org/projects/uriencoder/ (original: http://neosys.si/users/matej/rabbit/SiOL.net.htm ) I know that data URI is in general limited to 1024-4096 bytes (Mozilla unlimited) and that it would actually increase the file size and disable the caching effect. This is against current goals but I see the following arguments: o High-latency links have in general a high throughput. o Due to reduced size in JPEG re-compression the files could still be smaller. o Limitation of 4096 bytes would - due to JPEG file reduction - suffice for most sites. But with Firefox this limitation does not exist. o Caching is not so important for pages where you don't browse around the site and there are many new images anyway (news sites). I would suggest the following configuration variables: - enableDataURIforTags=IMG|STYLE|SCRIPT - enableDataURIforObjectsWithExtension=JPG|JPEG|CSS|GIF|JS - maximumSizeForDataURI=16384 ; Firefox can take it. - dontEmbeddDataURIforSites= * I would also replace HREFs that are pointing to (adfiler/blockfilter) blocked URLs with one of the following ones: o HREF to a fixed error page at the RABBIT server. This would allow for caching of the response. o HREF=data:,Rabbit%20denied%20this%20page This would remove the need for round trip to server for the 403 message. Unfortuantely it would mask the destination URL. But since the user can request unfiltered page he can still find it. I know that there would be another option - multipart encoding. But I have no idea how well this is supported accros browsers. Then there is one last proposal. You could implement SSL filtering as well. Proxomitron is a great example how it could be done. It users temporary SSL key between client and proxy and temporary or predefined SSL certificates when communicating with remote servers. -- Best regards, Matej. |
From: Robert O. <ro...@kh...> - 2006-04-03 20:34:40
|
Matej Miheli=E8 wrote: > I have tested the pre-3.1 version and it did fix most of my problems. I= t=20 > fixed all the reported ones. Glad to hear that. > There is one more problem regarding Firefox and incomplete pages in=20 > connection with time-outs. It looks like sometimes the connection will=20 > simply hang and then an incomplete page is returned and cached by the=20 > Firefox. I don't exactly know what is happening but I've not seen such=20 > behaviour with any other proxy and I have a few experiences. Hmmm, I have not seen this. Do you have a web site where it usually happens? > There is a condition that you may want to handle gracefully. Some=20 > webmasters use and send "-1" as a value for "Expires" header. I know=20 > this is a violation of the RFC but it is common. You should handle it=20 > gracefully. It means don't cache or already expired. Maybe, but note that there are _very_ many different expires that violate the spec. I could try to make rabbit be a bit nicer with -1 and already expired entries when it is run in non-strict mode. > If you would implement the ad blocking and URL blocking in a way that=20 > would enable user to use a pre-prepared block lists you would really=20 > help them. The most popular ad-block list would be the one for the=20 > ADBlock Firefox extension - G-Filter sets (=20 > http://www.pierceive.com/filtersetg/ ). They are formed in two separate= =20 > lists: black list and white list. By implementing the import from them=20 > in the Rabbit you would really help the users. This ought to be simple enough. Rabbit currently uses one regexp and=20 theese lists seems to be a set of regexps. > * Enhance "rabbit.filter.BlockFilter" to blocks HTTPS URLs as well. > I have explained in a previous message why I find this important. This could be easy or hard, depending on what you mean. Blocking the CONNECT request is trivial, but what happens on an tunneled=20 and encrypted connection is not something rabbit can filter. > * When possible embed external files into the HTML using the > RFC-2397 data URI scheme=20 Ugh, Sounds like this will take some time, if I am to do it. You do not=20 have a patch ready? ;-). > I know that data URI is in general limited to 1024-4096 bytes Rabbit currently has a 4k buffer so that is the current maximum uri for=20 rabbit. I plan to make that growable as needed in the future. I believe=20 that it ought to stay limited though. > * I would also replace HREFs that are pointing to > (adfiler/blockfilter) blocked URLs with one of the following o= nes: > o HREF to a fixed error page at the RABBIT server. This > would allow for caching of the response. > o HREF=3Ddata:,Rabbit%20denied%20this%20page What do you think that this does: adreplacer=3Dhttp://$proxy/FileSender/public/NoAd.gif So for ad-filtering this already works. Adding blocked site replacement=20 could be tricky, due to me wanting to have the real site name in the=20 "this page '<page url>' is blocked by rabbit configuration..." > Then there is one last proposal. You could implement SSL filtering as=20 > well. Proxomitron is a great example how it could be done. It users=20 > temporary SSL key between client and proxy and temporary or predefined=20 > SSL certificates when communicating with remote servers. Maybe, again, this is probably something that will take time. Rabbit is=20 a spare time project so patches are very welcome. Many nice ideas, I like it. Thanks /robo |
From: Matej M. <rab...@ma...> - 2006-04-04 07:57:10
|
> -----Original Message----- > From: Robert Olofsson [mailto:ro...@kh...] > Sent: 03. april 2006 22:34 > To: Matej Miheli=E8 > Cc: rab...@li... > Subject: Re: [Rabbit-proxy-development] 060403: Pre-3.1 imressions &=20 ideas > for the future releases [MM] ... > > There is one more problem regarding Firefox and incomplete pages in > > connection with time-outs. It looks like sometimes the connection wi= ll > > simply hang and then an incomplete page is returned and cached by th= e > > Firefox. I don't exactly know what is happening but I've not seen su= ch > > behaviour with any other proxy and I have a few experiences. > > Hmmm, I have not seen this. Do you have a web site where it usually > happens? [MM] I'll try to find a pattern. It is probably connected with me=20 overloading the Rabbit. My usual browsing habbits include simultaneous=20 opening of 30 tabs in Firefox :). > > There is a condition that you may want to handle gracefully. Some > > webmasters use and send "-1" as a value for "Expires" header. I know= > > this is a violation of the RFC but it is common. You should handle i= t > > gracefully. It means don't cache or already expired. > > Maybe, but note that there are _very_ many different expires that > violate the spec. I could try to make rabbit be a bit nicer with -1 an= d > already expired entries when it is run in non-strict mode. [MM] That's what I meant. This one is very common due to people not=20 thinking what it is written in MS ASP documentation. There is an example = specifying -1 for expires property. And they are using it anywhere. > > If you would implement the ad blocking and URL blocking in a way tha= t > > would enable user to use a pre-prepared block lists you would really= > > help them. The most popular ad-block list would be the one for the > > ADBlock Firefox extension - G-Filter sets ( > > http://www.pierceive.com/filtersetg/ ). They are formed in two separ= ate > > lists: black list and white list. By implementing the import from th= em > > in the Rabbit you would really help the users. > > This ought to be simple enough. Rabbit currently uses one regexp and > theese lists seems to be a set of regexps. [MM] I am glad to hear this. This really looks vary useful to me. > > * Enhance "rabbit.filter.BlockFilter" to blocks HTTPS URLs as wel= l. > > I have explained in a previous message why I find this importan= t. > > This could be easy or hard, depending on what you mean. > Blocking the CONNECT request is trivial, but what happens on an tunnel= ed > and encrypted connection is not something rabbit can filter. [MM] Yes. This would suffice. It would allow for filtering sites that=20 are trying to enforce HTTPS. =20 > > * When possible embed external files into the HTML using the > > RFC-2397 data URI scheme > > Ugh, Sounds like this will take some time, if I am to do it. You do no= t > have a patch ready? ;-). [MM] I get the back tone :) Your project is the first one in a long time = that is interesting to me. But I am no Java coder - actually I am no=20 programmer at all. If there were something to do with DB/SQL backend=20 than it would be easy for me to help. More important reason for me not=20 immediately offering help is that I have decided to steer as far away=20 from computers in my spare time as I possibly can. I am not very=20 successful in this resolution but I try to. =20 > > I know that data URI is in general limited to 1024-4096 bytes > > Rabbit currently has a 4k buffer so that is the current maximum uri fo= r > rabbit. I plan to make that growable as needed in the future. I believ= e > that it ought to stay limited though. > > > * I would also replace HREFs that are pointing to > > (adfiler/blockfilter) blocked URLs with one of the followin= g > ones: > > o HREF to a fixed error page at the RABBIT server. This= > > would allow for caching of the response. > > o HREF=DAta:,Rabbit%20denied%20this%20page > > What do you think that this does: > adreplacer=3Dhttp://$proxy/FileSender/public/NoAd.gif [MM] Ah... How easy is to get carried away and not thinking things=20 through twice.. Yes. You are right. > So for ad-filtering this already works. Adding blocked site replacemen= t > could be tricky, due to me wanting to have the real site name in the > "this page '<page url>' is blocked by rabbit configuration..." [MM] Well, you could embed the message in data URI. But this is a=20 function od ad-blocking - isn't it? Perhaps a variable with a block URL=20 would be usefull?. It would allow for the following syntax: adreplacer=3Ddata:text/html,<HTML><BODY><B>Rabbit3</B> denied this page -= =20 <A href=3D"$1noproxy.$2">Click here for unfiltered page</A></BODY></HTML>= > > Then there is one last proposal. You could implement SSL filtering a= s > > well. Proxomitron is a great example how it could be done. It users > > temporary SSL key between client and proxy and temporary or predefin= ed > > SSL certificates when communicating with remote servers. > > Maybe, again, this is probably something that will take time. Rabbit i= s > a spare time project so patches are very welcome. > > Many nice ideas, I like it. [MM] I must say that I am tempted. We will see. > Thanks > /robo |
From: Robert O. <ro...@kh...> - 2006-04-04 19:40:10
|
Matej Mihelic wrote: > [MM] I'll try to find a pattern. It is probably connected with me > overloading the Rabbit. My usual browsing habbits include simultaneous > opening of 30 tabs in Firefox :). I would not think so. Rabbit should handle many concurrent connections and unless you have changed your firefox config you are only using 4. Check "about:config" and network.http.max-persistent-connections-per-proxy, I have mine set to 8 at the moment. Also make sure that you have proxy.keep-alive and proxy pipelining set to true. Rabbit does not yet handle client side pipelining but I plan on handling that soon. > [MM] That's what I meant. This one is very common due to people not > thinking what it is written in MS ASP documentation. There is an example > specifying -1 for expires property. And they are using it anywhere. Ill check how rabbit handle them and see what we can do. > > This could be easy or hard, depending on what you mean. > > Blocking the CONNECT request is trivial, but what happens on an tunneled > > and encrypted connection is not something rabbit can filter. > > [MM] Yes. This would suffice. It would allow for filtering sites that > are trying to enforce HTTPS. Ok, then it is simple. Ill see what I can do. But please note that my immediate action is only to put it in the TODO-file (spare time project!). /robo |
From: Matej M. <rab...@ma...> - 2006-04-05 06:46:24
|
Robert Olofsson wrote: RO> Matej Mihelic wrote: MM> [MM] I'll try to find a pattern. It is probably connected with MM> me overloading the Rabbit. My usual browsing habbits include MM> simultaneous opening of 30 tabs in Firefox :). RO> I would not think so. Rabbit should handle many concurrent RO> connections and unless you have changed your firefox config RO> you are only using 4. Check "about:config" and RO> network.http.max-persistent-connections-per-proxy, I have mine RO> set to 8 at the moment. Also make sure that you have RO> proxy.keep-alive and proxy pipelining set to true. RO> Rabbit does not yet handle client side pipelining but I plan RO> on handling that soon. I have LOWERED my settings to match your. I was pipelining up to 16 concurrent requests. In general I work on a 4Mbps+ uplink. I have also upgraded my Firefox 1.5.1 to latest beta. I have a filling that at least one of the problems is actually Firefox's. It is now working better. I have retested with 32 concurrent pipelining requests and it seems to be work better as well. However this is not a complete test and it could be a coincidence. MM> [MM] That's what I meant. This one is very common due to people not MM> thinking what it is written in MS ASP documentation. There is an MM> example specifying -1 for expires property. And they are using it anywhere. RO>I'll check how rabbit handle them and see what we can do. MM> RO> This could be easy or hard, depending on what you mean. MM> RO> Blocking the CONNECT request is trivial, but what happens MM> RO> on an tunneled and encrypted connection is not something MM> RO> rabbit can filter. MM> MM> [MM] Yes. This would suffice. It would allow for filtering MM> sites that are trying to enforce HTTPS. RO> Ok, then it is simple. Ill see what I can do. But please note RO> that my immediate action is only to put it in the TODO-file RO> (spare time project!). RO> /robo Thanks. I'll install JDK environment on my notebook. Perhaps I can find some spare time as well. Unfortunately due to my lack of skills and knowledge of Java this won't be very productive. -- Regards, Matej. |
From: Matej M. <rab...@ma...> - 2006-04-04 11:59:38
|
* 060404: Rabbit pre-3.1 error messages I am having the following two messages that I can not diagnose. One appears in log files and the other one on the console. *** Log errors: [04/apr/2006:11:01:58 GMT][WARN][BaseHandler: error handling request: java.io.IOException: An established connection was aborted by the software in your host machine at sun.nio.ch.SocketDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(Unknown Source) at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source) at sun.nio.ch.IOUtil.write(Unknown Source) at sun.nio.ch.SocketChannelImpl.write(Unknown Source) at sun.nio.ch.FileChannelImpl.transferToTrustedChannel(Unknown Source) at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source) at rabbit.proxy.FileResourceSource.transferTo(FileResourceSource.java:59) at rabbit.proxy.TransferHandler.run(TransferHandler.java:42) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ] *** Console errors: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(Unknown Source) at java.util.HashMap$KeyIterator.next(Unknown Source) at java.util.Collections$UnmodifiableCollection$1.next(Unknown Source) at rabbit.proxy.HttpProxy.cancelTimeouts(HttpProxy.java:402) at rabbit.proxy.HttpProxy.run(HttpProxy.java:383) at java.lang.Thread.run(Unknown Source) |
From: Matej M. <rab...@ma...> - 2006-04-04 12:48:51
|
* 060404: BUG REPORT - Rabbit pre-3.1 error messages - incorrect handling of user authorisation for SSL connections I think the Rabbit performs incorrect handshake over SSL when combined with user authentification. An example: curl -o t.txt -w "HTTP_CODE: %{http_code}" -k --proxy proxy:port --proxy-user "user:pass" https://updates.mozilla.org HTTP_CODE: 000curl: (35) error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol I get the following line in access.log: 172.16.33.70 - - 04/apr/2006:12:39:11 GMT "CONNECT updates.mozilla.org:443 HTTP/1.0" 200 - And the following line in the error.log: [04/apr/2006:12:44:33 GMT][WARN][Tunnel: failed to handle: java.io.IOException: An existing connection was forcibly closed by the remote host] Without Rabbit3 in between I'll get the following HTTP CODE: 301 To file a complete report: If i open the same page over the HTTP address I'll get no message in the error.log and with the following line in the access.log: 172.16.33.70 - user 04/apr/2006:12:43:59 GMT "GET http://updates.mozilla.org HTTP/1.1" 200 - |
From: Robert O. <ro...@kh...> - 2006-04-14 13:48:14
|
Matej Mihelic wrote: > I think the Rabbit performs incorrect handshake over SSL when combined > with user authentification. It seems to work for me. I get: HTTP_CODE: 301 both if I specify --proxy-user or not. Note: rabbit does not run any http filters on CONNECT requests (that is probably a bug). Rabbit do run the ip filters though. > I get the following line in access.log: > 172.16.33.70 - - 04/apr/2006:12:39:11 GMT "CONNECT > updates.mozilla.org:443 HTTP/1.0" 200 - Seems normal. Note that the status code in rabbits access_log is the status code for rabbit, not the status code from the real server. Rabbit handled this connection without problems so it is a 200 Ok. If the resource had a http header with a status of 500, 404, 301 or 200 does not really matter to rabbit. > And the following line in the error.log: > [04/apr/2006:12:44:33 GMT][WARN][Tunnel: failed to handle: > java.io.IOException: An existing connection was forcibly closed by the > remote host] Yes the tunnel does not always understand nicely when the connection is closed. So sometimes it logs. This is not a problem. > Without Rabbit3 in between I'll get the following HTTP CODE: 301 /robo |
From: Robert O. <ro...@kh...> - 2006-04-04 19:50:57
|
Matej Mihelic wrote: > *** Log errors: > [04/apr/2006:11:01:58 GMT][WARN][BaseHandler: error handling request: > java.io.IOException: An established connection was aborted by the > software in your host machine > at sun.nio.ch.SocketDispatcher.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(Unknown Source) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source) > at sun.nio.ch.IOUtil.write(Unknown Source) > at sun.nio.ch.SocketChannelImpl.write(Unknown Source) > at sun.nio.ch.FileChannelImpl.transferToTrustedChannel(Unknown Source) > at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source) > at > rabbit.proxy.FileResourceSource.transferTo(FileResourceSource.java:59) I have seen this one as well, it does not seem to generate any problems. But until I know the exact cause of this the warning will stay. One cause may be that you load only half a page and then move on, causing the browser to abort that download. There seems to be more causes for this though. > *** Console errors: > java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextEntry(Unknown Source) > at java.util.HashMap$KeyIterator.next(Unknown Source) > at java.util.Collections$UnmodifiableCollection$1.next(Unknown > Source) > at rabbit.proxy.HttpProxy.cancelTimeouts(HttpProxy.java:402) > at rabbit.proxy.HttpProxy.run(HttpProxy.java:383) > at java.lang.Thread.run(Unknown Source) Ok, I have not seen this one. Good to hear about it. I currently have no idea what caused it. But at least rabbit should continue, possibly keeping one connection open until the next round so it is not a big problem. /robo |