Re: [http-replicator-users] http-replicator patches (filename, last-modified, and umask)
Brought to you by:
g3rtjan
From: Gertjan v. Z. <ger...@gm...> - 2013-07-16 22:56:09
|
Hi Corey Thank you very much for these patches, and for your thorough discussion of it. I was forced to abandon this project long ago, as I'm sure you realized from lack of activity, but it has remained dear to me. Nice to hear that somebody is actually using it. Based on your patches I made a new commit, in which I applied umask.patch without modifications and multiple_last-modified_header_formats.patch with some cosmetic edits. Both are clear improvements and thank you for figuring out these formats. The solution to the long filename issue I liked a little less, for exactly the name collission risk you already mention. In my commit I fixed it in a different way which I hope will equally solve the problem: rather than discarding the excess length I shorten it by forming an md5 hash. Check Cache.py:24 to see what's going on. Everything shorter than 255 characters is not touched so this change remains compatible with existing caches. I cannot do thorough testing right now, so it would be very helpful if you could confirm that everything is working as expected. Also do let me know if you start working on that python3 port. That would certainly be good to have. Thanks again! Gertjan On Mon, Jul 15, 2013 at 8:03 AM, Corey Wright <und...@po...> wrote: > please find attached a series of patches that i've applied to my personal > use > of http-replicator (and finally remembered to publish in case others are > interested). i would normally include each patch in its own email, but for > brevity i didn't, and based on the lack of list traffic, i doubt anybody > will > notice/care. > > i have it on my wish list to one day port http-replicator 4.0alpha2 to > python3 (for when linux distros start to abandon / stop carrying python 2 > and > to force me to learn/embrace python 3) if somebody doesn't beat me to it. > > patches: > > http-replicator_4.0alpha2_max-filename-length-255.patch > > this should be fairly straight forward: i use http-replicator with some > sites > that have obnoxiously long file names (actually long/many query strings, > but > those get translated to file names for http-replicator's cached/saved > files) > and instead of having http-replicator just fail on them (and not cache/save > them), i rather have them saved with truncated names. yes, this does > introduce the risk of file name collision (two files with the same > truncated > name), but i've never seen (or at least noticed) that in practice. > > http-replicator_4.0alpha2_multiple_last-modified_header_formats.patch > > on my journeys across the internet with http-replicator i've seen three > different string formats for the Last-Modified header, two of them that > http-replicator didn't account for and generated a stack trace (and more > importantly to me: not save/cache the file). rather than have > http-replicator > fail, it now tries two additional formats (in order of decreasing number of > personal encounters on the internet) and then raises a more descriptive > (imho) exception. > > at first i kept Params.TIMEFMT as-is (ie a string) and added another > variable > to handle the second time format string (eg TIMEFMT2), but when i added the > third string format i decided for ease of adding new ones i would convert > TIMEFMT to an array of time format strings and iterate through them. of > course, since i did that i haven't had to add any more time formats. i > kept > the error handling code as exceptions for efficiency (iirc, python > exceptions > only cause performance penalties when raising/handling the exception, so > the > normal case is not/minimally affected) and raise the same exception (ie > ValueError) when no time format string matches, but with a more descriptive > error string. > > because of my changing TIMEFMT from a string to a tuple of strings, the > code changes are pervasive, though limited to when http-replicator > wants to use TIMEFMT for generating/writing a time (as compared to > reading/converting the time in the Last-Modified header). i admit now > that i > review it, "TIMEFMT[0]" is not very descriptive (eg "why the first value > and not the second or third one?"), and i probably should have left TIMEFMT > alone and created a new TIMEFMTS variable as a tuple of strings, with > TIMEFMTS > defined as: > > TIMEFMTS = (TIMEFMT, '%a, %d %b %Y %H:%M:%S +0000 GMT', '%a, %d %b %Y > %H:%M:%S +0000') > > http-replicator_4.0alpha2_umask.patch > > this is probably the most trivial, but most arguable/bike-shedding, patch > of > the bunch: i set umask to 0022 (from 0000) because i have other users, than > the uid/gid running http-replicator, accessing the files (directly or over > a > network file system) and want to insure they cannot write/modify the > files/directories (ie security). i can't remember why i set the umask > instead of just commenting the original umask line out and letting the > default user/system stay in affect, but i submit this patch to bring > attention to the potential insecurity (depending on use-case). > > thank you for http-replicator! > > corey > -- > und...@po... > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > http-replicator-users mailing list > htt...@li... > https://lists.sourceforge.net/lists/listinfo/http-replicator-users > > |