[Httplib2-discuss] Non-opaque cache keys
Status: Beta
Brought to you by:
jcgregorio
From: Sam R. <ru...@in...> - 2006-11-15 02:38:30
|
It looks like the current implementation takes the md5 of a somewhat normalized URI and passes that as a key to the cache. For debug-ability and to increase the potential for integration with other subsystems, I'd like to suggest that this be changed to pass either the original URI unaltered or a normalized URI with the logic to do the normalization refactored out into a separate function. Either way, the current FileCache could do the remaining normalization/hashing of the key. Other storage systems could either use the key as is, or could employ a different hash mechanism. Potential use cases, based on Planet Venus: 1) Occasionally, I find it useful to force a re-fetch/re-parse, and this is most easily done if I can delete an individual file. This is easier to do reliably if the file name is less opaque. Venus already has a function which will compute a readable name in most cases. 2) Like Robert Leftwich, I have a time consuming process that I would like to optimize away whenever possible. And there are a lot of broken servers out there which do not support either ETag or Last Modified (for feeds, this has been estimated at about 30%). If I can retrieve the feed from the cache before the fetch, and compare it to the value after the fetch, I can treat the response as if it were a 304. 3) Not all systems are CPU constrained. Others are memory constrained. The threading logic that you created for Venus (which is much appreciated) builds up a queue of feeds. Given that this data is out on disk, it need not be in memory while in the queue. Ideally for this scenario, it would be ideal if there were an option so that httplib2 doesn't return the content in the first place, as it does now even for a 304. For use cases #2 and #3, it would be desirable to be able to access the cache without involving HTTP, and for that there needs to at a minimum be a predictable algorithm for computing the cache key. I can imagine other hypothetical scenarios involving sharing this cache with other applications, but hopefully these illustrate the requirement sufficiently. - Sam Ruby P.S. Other normalization ideas can be found here: http://www.intertwingly.net/blog/2004/08/04/Urlnorm For example, scheme is supposed to be case insensitive. |