[http-replicator-users] http-replicator patches (filename, last-modified, and umask)
Brought to you by:
g3rtjan
From: Corey W. <und...@po...> - 2013-07-15 06:03:55
|
please find attached a series of patches that i've applied to my personal use of http-replicator (and finally remembered to publish in case others are interested). i would normally include each patch in its own email, but for brevity i didn't, and based on the lack of list traffic, i doubt anybody will notice/care. i have it on my wish list to one day port http-replicator 4.0alpha2 to python3 (for when linux distros start to abandon / stop carrying python 2 and to force me to learn/embrace python 3) if somebody doesn't beat me to it. patches: http-replicator_4.0alpha2_max-filename-length-255.patch this should be fairly straight forward: i use http-replicator with some sites that have obnoxiously long file names (actually long/many query strings, but those get translated to file names for http-replicator's cached/saved files) and instead of having http-replicator just fail on them (and not cache/save them), i rather have them saved with truncated names. yes, this does introduce the risk of file name collision (two files with the same truncated name), but i've never seen (or at least noticed) that in practice. http-replicator_4.0alpha2_multiple_last-modified_header_formats.patch on my journeys across the internet with http-replicator i've seen three different string formats for the Last-Modified header, two of them that http-replicator didn't account for and generated a stack trace (and more importantly to me: not save/cache the file). rather than have http-replicator fail, it now tries two additional formats (in order of decreasing number of personal encounters on the internet) and then raises a more descriptive (imho) exception. at first i kept Params.TIMEFMT as-is (ie a string) and added another variable to handle the second time format string (eg TIMEFMT2), but when i added the third string format i decided for ease of adding new ones i would convert TIMEFMT to an array of time format strings and iterate through them. of course, since i did that i haven't had to add any more time formats. i kept the error handling code as exceptions for efficiency (iirc, python exceptions only cause performance penalties when raising/handling the exception, so the normal case is not/minimally affected) and raise the same exception (ie ValueError) when no time format string matches, but with a more descriptive error string. because of my changing TIMEFMT from a string to a tuple of strings, the code changes are pervasive, though limited to when http-replicator wants to use TIMEFMT for generating/writing a time (as compared to reading/converting the time in the Last-Modified header). i admit now that i review it, "TIMEFMT[0]" is not very descriptive (eg "why the first value and not the second or third one?"), and i probably should have left TIMEFMT alone and created a new TIMEFMTS variable as a tuple of strings, with TIMEFMTS defined as: TIMEFMTS = (TIMEFMT, '%a, %d %b %Y %H:%M:%S +0000 GMT', '%a, %d %b %Y %H:%M:%S +0000') http-replicator_4.0alpha2_umask.patch this is probably the most trivial, but most arguable/bike-shedding, patch of the bunch: i set umask to 0022 (from 0000) because i have other users, than the uid/gid running http-replicator, accessing the files (directly or over a network file system) and want to insure they cannot write/modify the files/directories (ie security). i can't remember why i set the umask instead of just commenting the original umask line out and letting the default user/system stay in affect, but i submit this patch to bring attention to the potential insecurity (depending on use-case). thank you for http-replicator! corey -- und...@po... |