[Rabbit-proxy-development] 060403: Pre-3.1 imressions & ideas for the future releases

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

* 060403: Pre-3.1 impressions & ideas for the future releases

I have tested the pre-3.1 version and it did fix most of my problems. It 
fixed all the reported ones.

There is one more problem regarding Firefox and incomplete pages in 
connection with time-outs. It looks like sometimes the connection will 
simply hang and then an incomplete page is returned and cached by the 
Firefox. I don't exactly know what is happening but I've not seen such 
behaviour with any other proxy and I have a few experiences.

There is a condition that you may want to handle gracefully. Some 
webmasters use and send "-1" as a value for "Expires" header. I know 
this is a violation of the RFC but it is common. You should handle it 
gracefully. It means don't cache or already expired.

As for my suggestions for the future I would like to suggest the 
following enhancements that would improve user experience in the 
following two regions:

    * Ads are really a bandwidth hogs and are annoying.
    * The proliferation of satellite and mobile (3G GSM - EDGE, UMTS)
      connections with very high latency (500ms+) requires a
      minimisation of requests sent from client to server.

If you would implement the ad blocking and URL blocking in a way that 
would enable user to use a pre-prepared block lists you would really 
help them. The most popular ad-block list would be the one for the 
ADBlock Firefox extension - G-Filter sets ( 
http://www.pierceive.com/filtersetg/ ). They are formed in two separate 
lists: black list and white list. By implementing the import from them 
in the Rabbit you would really help the users.

My suggestions for new ad-blocking features would be the following:

    * Implement white-list and black-list definitions for the
      "rabbit.filter.BlockFilter" as well as for the
      "rabbit.filter.AdFilter" ("blockURLmatching" and
      "DontBlockURLmatching"). This would allow for more a more relaxed
      filtering with optional white-listing of certain sites.
    * Besides current format for filters allow for read patterns from
      file in the G-Filter lists format ("blockURLmatchingFile" and
      "DontBlockURLmatchingFile"). You should probably convert the
      patterns to common format and merge them with the ones from
      "blockURLmatching"/"DontBlockURLmatching".
    * Enhance "rabbit.filter.BlockFilter" to blocks HTTPS URLs as well.
      I have explained in a previous message why I find this important.

My suggestions for high-latency links accelerations are the following:

    * When possible embed external files into the HTML using the
      RFC-2397 data URI scheme (IMG tag, SCRIPT, STYLE tag - you fetch
      the file from SRC/HREF and replace it) . References: (
      http://en.wikipedia.org/wiki/Data:_URL,
      http://www.mozilla.org/quality/networking/docs/aboutdata.html ).

      I know that this is currently only supported through Mozilla and
      Opera browsers but it would probably help tremendously on
      high-latency link. There is a way to get partial RFC-2397 support
      in the IE through protocol handler but it will be limited by the
      URL connection limit in IE. I've put you a copy of the IE plugin
      on my server "http://neosys.si/users/Matej/DataProtocol.zip".

      Examples:
      - http://neosys.si/users/matej/rabbit/Data_SiOL.net.htm
         (Opera and Netscape)
      - http://neosys.si/users/matej/rabbit/Data_IE_SiOL.net.htm
         (IE - doesn't support data URIs GIF?!?)
      - http://www.scalora.org/projects/uriencoder/

      (original: http://neosys.si/users/matej/rabbit/SiOL.net.htm )

      I know that data URI is in general limited to 1024-4096 bytes
      (Mozilla unlimited) and that it would actually increase the file
      size and disable the caching effect. This is against current goals
      but I see the following arguments:
          o High-latency links have in general a high throughput.
          o Due to reduced size in JPEG re-compression the files could
            still be smaller.
          o Limitation of 4096 bytes would - due to JPEG file reduction
            - suffice for most sites. But with Firefox this limitation
            does not exist.
          o Caching is not so important for pages where you don't browse
            around the site and there are many new images anyway (news
            sites).

    I would suggest the following configuration variables:

    - enableDataURIforTags=IMG|STYLE|SCRIPT
    - enableDataURIforObjectsWithExtension=JPG|JPEG|CSS|GIF|JS
    - maximumSizeForDataURI=16384 ; Firefox can take it.
    - dontEmbeddDataURIforSites=

        * I would also replace HREFs that are pointing to
          (adfiler/blockfilter) blocked URLs with one of the following ones:
              o HREF to a fixed error page at the RABBIT server. This
                would allow for caching of the response.
              o HREF=data:,Rabbit%20denied%20this%20page

                This would remove the need for round trip to server for
                the 403 message. Unfortuantely it would mask the
                destination URL. But since the user can request
                unfiltered page he can still find it.

    I know that there would be another option - multipart encoding. But
    I have no idea how well this is supported accros browsers.

Then there is one last proposal. You could implement SSL filtering as 
well. Proxomitron is a great example how it could be done. It users 
temporary SSL key between client and proxy and temporary or predefined 
SSL certificates when communicating with remote servers.

-- 
 Best regards,
   Matej.