http-replicator-users Mailing List for HTTP Replicator
Brought to you by:
g3rtjan
You can subscribe to this list here.
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2008 |
Jan
(8) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
From: Gertjan v. Z. <ger...@gm...> - 2013-12-29 19:54:02
|
Dear all, Today I moved replicator to github<http://github.com/gertjanvanzwieten/replicator>, where I can keep it together with a number of other projects I maintain. In future if you like to contribute patches please send me a pull request there. Note that this move does not indicate renewed activity for the project. For my current take on its status read this short account of its history<http://gjvz.nl/projects.html#replicator>that I just wrote on my home page. Thank you sourceforge for being a very fine host for over six years. Gertjan |
From: Gertjan v. Z. <ger...@gm...> - 2013-07-16 22:56:09
|
Hi Corey Thank you very much for these patches, and for your thorough discussion of it. I was forced to abandon this project long ago, as I'm sure you realized from lack of activity, but it has remained dear to me. Nice to hear that somebody is actually using it. Based on your patches I made a new commit, in which I applied umask.patch without modifications and multiple_last-modified_header_formats.patch with some cosmetic edits. Both are clear improvements and thank you for figuring out these formats. The solution to the long filename issue I liked a little less, for exactly the name collission risk you already mention. In my commit I fixed it in a different way which I hope will equally solve the problem: rather than discarding the excess length I shorten it by forming an md5 hash. Check Cache.py:24 to see what's going on. Everything shorter than 255 characters is not touched so this change remains compatible with existing caches. I cannot do thorough testing right now, so it would be very helpful if you could confirm that everything is working as expected. Also do let me know if you start working on that python3 port. That would certainly be good to have. Thanks again! Gertjan On Mon, Jul 15, 2013 at 8:03 AM, Corey Wright <und...@po...> wrote: > please find attached a series of patches that i've applied to my personal > use > of http-replicator (and finally remembered to publish in case others are > interested). i would normally include each patch in its own email, but for > brevity i didn't, and based on the lack of list traffic, i doubt anybody > will > notice/care. > > i have it on my wish list to one day port http-replicator 4.0alpha2 to > python3 (for when linux distros start to abandon / stop carrying python 2 > and > to force me to learn/embrace python 3) if somebody doesn't beat me to it. > > patches: > > http-replicator_4.0alpha2_max-filename-length-255.patch > > this should be fairly straight forward: i use http-replicator with some > sites > that have obnoxiously long file names (actually long/many query strings, > but > those get translated to file names for http-replicator's cached/saved > files) > and instead of having http-replicator just fail on them (and not cache/save > them), i rather have them saved with truncated names. yes, this does > introduce the risk of file name collision (two files with the same > truncated > name), but i've never seen (or at least noticed) that in practice. > > http-replicator_4.0alpha2_multiple_last-modified_header_formats.patch > > on my journeys across the internet with http-replicator i've seen three > different string formats for the Last-Modified header, two of them that > http-replicator didn't account for and generated a stack trace (and more > importantly to me: not save/cache the file). rather than have > http-replicator > fail, it now tries two additional formats (in order of decreasing number of > personal encounters on the internet) and then raises a more descriptive > (imho) exception. > > at first i kept Params.TIMEFMT as-is (ie a string) and added another > variable > to handle the second time format string (eg TIMEFMT2), but when i added the > third string format i decided for ease of adding new ones i would convert > TIMEFMT to an array of time format strings and iterate through them. of > course, since i did that i haven't had to add any more time formats. i > kept > the error handling code as exceptions for efficiency (iirc, python > exceptions > only cause performance penalties when raising/handling the exception, so > the > normal case is not/minimally affected) and raise the same exception (ie > ValueError) when no time format string matches, but with a more descriptive > error string. > > because of my changing TIMEFMT from a string to a tuple of strings, the > code changes are pervasive, though limited to when http-replicator > wants to use TIMEFMT for generating/writing a time (as compared to > reading/converting the time in the Last-Modified header). i admit now > that i > review it, "TIMEFMT[0]" is not very descriptive (eg "why the first value > and not the second or third one?"), and i probably should have left TIMEFMT > alone and created a new TIMEFMTS variable as a tuple of strings, with > TIMEFMTS > defined as: > > TIMEFMTS = (TIMEFMT, '%a, %d %b %Y %H:%M:%S +0000 GMT', '%a, %d %b %Y > %H:%M:%S +0000') > > http-replicator_4.0alpha2_umask.patch > > this is probably the most trivial, but most arguable/bike-shedding, patch > of > the bunch: i set umask to 0022 (from 0000) because i have other users, than > the uid/gid running http-replicator, accessing the files (directly or over > a > network file system) and want to insure they cannot write/modify the > files/directories (ie security). i can't remember why i set the umask > instead of just commenting the original umask line out and letting the > default user/system stay in affect, but i submit this patch to bring > attention to the potential insecurity (depending on use-case). > > thank you for http-replicator! > > corey > -- > und...@po... > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > http-replicator-users mailing list > htt...@li... > https://lists.sourceforge.net/lists/listinfo/http-replicator-users > > |
From: Corey W. <und...@po...> - 2013-07-15 06:03:55
|
please find attached a series of patches that i've applied to my personal use of http-replicator (and finally remembered to publish in case others are interested). i would normally include each patch in its own email, but for brevity i didn't, and based on the lack of list traffic, i doubt anybody will notice/care. i have it on my wish list to one day port http-replicator 4.0alpha2 to python3 (for when linux distros start to abandon / stop carrying python 2 and to force me to learn/embrace python 3) if somebody doesn't beat me to it. patches: http-replicator_4.0alpha2_max-filename-length-255.patch this should be fairly straight forward: i use http-replicator with some sites that have obnoxiously long file names (actually long/many query strings, but those get translated to file names for http-replicator's cached/saved files) and instead of having http-replicator just fail on them (and not cache/save them), i rather have them saved with truncated names. yes, this does introduce the risk of file name collision (two files with the same truncated name), but i've never seen (or at least noticed) that in practice. http-replicator_4.0alpha2_multiple_last-modified_header_formats.patch on my journeys across the internet with http-replicator i've seen three different string formats for the Last-Modified header, two of them that http-replicator didn't account for and generated a stack trace (and more importantly to me: not save/cache the file). rather than have http-replicator fail, it now tries two additional formats (in order of decreasing number of personal encounters on the internet) and then raises a more descriptive (imho) exception. at first i kept Params.TIMEFMT as-is (ie a string) and added another variable to handle the second time format string (eg TIMEFMT2), but when i added the third string format i decided for ease of adding new ones i would convert TIMEFMT to an array of time format strings and iterate through them. of course, since i did that i haven't had to add any more time formats. i kept the error handling code as exceptions for efficiency (iirc, python exceptions only cause performance penalties when raising/handling the exception, so the normal case is not/minimally affected) and raise the same exception (ie ValueError) when no time format string matches, but with a more descriptive error string. because of my changing TIMEFMT from a string to a tuple of strings, the code changes are pervasive, though limited to when http-replicator wants to use TIMEFMT for generating/writing a time (as compared to reading/converting the time in the Last-Modified header). i admit now that i review it, "TIMEFMT[0]" is not very descriptive (eg "why the first value and not the second or third one?"), and i probably should have left TIMEFMT alone and created a new TIMEFMTS variable as a tuple of strings, with TIMEFMTS defined as: TIMEFMTS = (TIMEFMT, '%a, %d %b %Y %H:%M:%S +0000 GMT', '%a, %d %b %Y %H:%M:%S +0000') http-replicator_4.0alpha2_umask.patch this is probably the most trivial, but most arguable/bike-shedding, patch of the bunch: i set umask to 0022 (from 0000) because i have other users, than the uid/gid running http-replicator, accessing the files (directly or over a network file system) and want to insure they cannot write/modify the files/directories (ie security). i can't remember why i set the umask instead of just commenting the original umask line out and letting the default user/system stay in affect, but i submit this patch to bring attention to the potential insecurity (depending on use-case). thank you for http-replicator! corey -- und...@po... |
From: Gertjan v. Z. <ger...@gm...> - 2008-02-24 22:56:21
|
Hi Ed, Goffredo, and others, Just a quick apology for not replying to any mail, patches, suggestions lately. I will answer them as soon as I find the time to have a good look; rather than giving a half answer now. Probably this week. Gj |
From: Goffredo B. <kre...@al...> - 2008-02-22 12:22:41
|
Hello Gertjan, what about the enclosed patch ? Is there any reason, which blocks this patch ? If you like the idea, a possible improvement is to specify the depth of the directory for the "flat" file. For example: the *.rpm files don't need to record the path. If the file names differ, the contents too. Instead for the "primary.xml.gz" files, I need to record almost 7 level of directory (... 7/Fedora/i386/os/repodata/primary.xml.gz) in order to differentiate it between the different distros. Please give me a feedback. I will provide an enanched patch. BR Goffredo ---------- Forwarded Message ---------- Subject: [PATCH] http-replicator - set "flat mode" only for some file Date: Thursday 31 January 2008 From: Goffredo Baroncelli <kre...@al...> To: htt...@li... Hello Gertjan, Enclose you can find a patch to http-replicator. This patch permit to select the "flat" mode only for some file. The selction is done by a regular expression. This behaviour is asked by a my friends (in cc). He want to cache the request for rpm packages. For Fedora, if two *rpm* packages have different names, then the packages are differents. But for other files (notabily the indices, which have same name but different path) this is not true. So I added the option --flat-pattern <PATTERN>. When a requested file matches a pattern, this is stored only by its name. Otherwise not. So if I start http-replicator as: export http_proxy=localhost:8888 http-replicator -r /tmp/cache \ -p 8888 \ --flat-pattern "rpm$" \ --daemon /tmp/replicator.log and then I download wget 'http://site1/fedora/8/Fedora/i386/os/Packages/akode-pulseaudio-2.0.1-9.fc8.i386.rpm wget 'http://site2/fedora/8/Fedora/i386/os/Packages/akode-pulseaudio-2.0.1-9.fc8.i386.rpm wget 'http://site3/fedora/8/Fedora/i386/os/Packages/akode-pulseaudio-2.0.1-9.fc8.i386.rpm The 2nd and 3rd requests are satisfied by the cache, because the filenames are the same AND the extension is "rpm" Instead if I do: wget 'http://site1/fedora/8/Fedora/i386/os/repodata/primary.xml.gz wget 'http://site1/fedora/7/Fedora/i386/os/repodata/primary.xml.gz The 2nd request is not satisfied by the cache, because even tough the filenames are the same, the extension not are "rpm". BR Goffredo -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kre...@in...> Key fingerprint = CE3C 7E01 6782 30A3 5B87 87C0 BB86 505C 6B2A CFF9 ------------------------------------------------------- -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kre...@in...> Key fingerprint = CE3C 7E01 6782 30A3 5B87 87C0 BB86 505C 6B2A CFF9 |
From: Goffredo B. <kre...@al...> - 2008-01-31 21:16:14
|
Hello Gertjan, Enclose you can find a patch to http-replicator. This patch permit to select the "flat" mode only for some file. The selction is done by a regular expression. This behaviour is asked by a my friends (in cc). He want to cache the request for rpm packages. For Fedora, if two *rpm* packages have different names, then the packages are differents. But for other files (notabily the indices, which have same name but different path) this is not true. So I added the option --flat-pattern <PATTERN>. When a requested file matches a pattern, this is stored only by its name. Otherwise not. So if I start http-replicator as: export http_proxy=localhost:8888 http-replicator -r /tmp/cache \ -p 8888 \ --flat-pattern "rpm$" \ --daemon /tmp/replicator.log and then I download wget 'http://site1/fedora/8/Fedora/i386/os/Packages/akode-pulseaudio-2.0.1-9.fc8.i386.rpm wget 'http://site2/fedora/8/Fedora/i386/os/Packages/akode-pulseaudio-2.0.1-9.fc8.i386.rpm wget 'http://site3/fedora/8/Fedora/i386/os/Packages/akode-pulseaudio-2.0.1-9.fc8.i386.rpm The 2nd and 3rd requests are satisfied by the cache, because the filenames are the same AND the extension is "rpm" Instead if I do: wget 'http://site1/fedora/8/Fedora/i386/os/repodata/primary.xml.gz wget 'http://site1/fedora/7/Fedora/i386/os/repodata/primary.xml.gz The 2nd request is not satisfied by the cache, because even tough the filenames are the same, the extension not are "rpm". BR Goffredo -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kre...@in...> Key fingerprint = CE3C 7E01 6782 30A3 5B87 87C0 BB86 505C 6B2A CFF9 |
From: Ed S. <es...@ar...> - 2008-01-31 18:49:05
|
Sorry about my vague description of the patch. I hope I can clarify it: The purpose of the --mirror option is to make http_replicator behave as a reverse proxy rather than a normal proxy (so maybe it should be called --reverseproxy or something more precise). Let's say you run two instances of http_replicator on myserver: http_replicator --port 8080 --root /proxy http_replicator --port 8081 --root /mirror --mirror http://upstream.com Then you can download a document from upstream.com in two ways: http_proxy=http://myserver:8080 wget http://upstream.com/foo/doc.html wget http://myserver:8081/foo/doc.html The second http_replicator instance is effectively serving as a local mirror of http://upstream.com, just as if it were a regular httpd serving static content that the admin had manually rsynced from http://upstream.com. Since this instance proxies requests only to a single host, all the cached content ends up under /mirror/upstream.com:80; the --nohost option avoids this redundant directory level. Does that make sense? --Ed |
From: Gertjan v. Z. <ger...@gm...> - 2008-01-31 11:09:18
|
Hi Shashank I'll probably be able to make a .deb for it as well, and try to get it > into (ubuntu) universe. I hope that's ok. Sure, more than ok. I used to provide a deb myself but I never managed to get it in the official (debian) repository. Then when I switched to macintosh I decided to leave packaging to others altogether. So if you would care to take this task upon you for the ubuntu world: please go ahead! Thanks Gj |
From: Gertjan v. Z. <ger...@gm...> - 2008-01-31 10:18:18
|
Ok - I released 4.0alpha2 with a GPL license file. Yes, certainly. I posted the patch to the tracker at > > http://sourceforge.net/tracker/index.php?func=detail&aid=1882403&group_id=195382&atid=953220 I see. I suppose the --nohost option is meant to be an easy way of dealing with mirrors, limited to hosts that serve files in exactly the same directory structure? I was thinking of using symlinks for that purpose, manually put in the cache directory for host/path prefixes that are known to have identical content. Symlinks form a security risk so I should still add a check that prevents pointing away from cache, or maybe away from a list of allowed paths, but that is a different matter - the approach should already work. Granted it takes a little more work to make these links, although of course I would suggest to script it, but it is definitely safer than setting all hosts equal. Compare the old --flat option that went as far as stripping away everything but the filename itself, and then consider a common file like index.html. The --mirror option, if I see it correctly, prepends a host/path prefix to every requested file. I can't quite figure out the use from the patch summary. Is it your aim to configure at proxy level which mirror is used to download packages from? Wouldn't the same be achieved by setting all mirrors equal by the above symlink approach? Gj |
From: Ed S. <es...@ar...> - 2008-01-30 15:34:51
|
On 1/30/08, Gertjan van Zwieten <ger...@gm...> wrote: > Sounds good. Is this a replicator patch and, if so, would it make sense to > have these changes in the original as well? Yes, certainly. I posted the patch to the tracker at http://sourceforge.net/tracker/index.php?func=detail&aid=1882403&group_id=195382&atid=953220 . > Sure. Actually now that you mention it I believe Sourceforge demands the > same. I was rather aiming for next release to bring some improvements I was > working on, but unfortunately my hard disk crashed before committing any of > them to svn. Tough luck.. So now I will make a minor release instead with > only the GPL licence file added, just to have something packagable. The > improvements will follow soon as I worked up the spirit to start over. Great, thanks. --Ed |
From: Gertjan v. Z. <ger...@gm...> - 2008-01-30 08:05:29
|
Hi Ed Thanks for developing http-replicator. It will be very useful for > people who want to set up a site-local yum/apt repository without > messing with rsync or complicated caching proxy servers. I have > already posted a patch with some minor extensions supporting this > application. Sounds good. Is this a replicator patch and, if so, would it make sense to have these changes in the original as well? I would like to package http-replicator as an rpm and submit it to the > Fedora project. I notice it is missing a license. Would you mind > adding a LICENSE file to the source tree? Sure. Actually now that you mention it I believe Sourceforge demands the same. I was rather aiming for next release to bring some improvements I was working on, but unfortunately my hard disk crashed before committing any of them to svn. Tough luck.. So now I will make a minor release instead with only the GPL licence file added, just to have something packagable. The improvements will follow soon as I worked up the spirit to start over. Thanks for the initiative! Gj |
From: Ed S. <es...@ar...> - 2008-01-30 06:39:35
|
Hello Gertjan, Thanks for developing http-replicator. It will be very useful for people who want to set up a site-local yum/apt repository without messing with rsync or complicated caching proxy servers. I have already posted a patch with some minor extensions supporting this application. I would like to package http-replicator as an rpm and submit it to the Fedora project. I notice it is missing a license. Would you mind adding a LICENSE file to the source tree? Thanks, --Ed |
From: Gertjan v. Z. <ger...@gm...> - 2008-01-01 13:50:08
|
Finally. Today I finally managed to release a replicator that is 100% to my satisfaction. Which does not mean to say it is bug free, nor feature complete. But it is a version that I can work at without the nagging feeling that things should change drastically, and that all patching up is lost effort. The new server has a shining new engine, the fiber module, which I started documenting some time ago in README.devel included in the package - work in progress but the idea is there. In short: each separate transaction is a generator object (available since python 2.3; a very strict dependency) that yields state information to the scheduler. Compared to the asyncore framework that I used previously the new system is much more flexible, in a way emulating a multi-threaded setup while still being the lightweight single-threaded application it always was. The new flexibility made it possible to implement often requested features such as: - server-side download resuming - ftp support - bandwidth shaping - support for ipv6 (although this should be tested!); and to fix outstanding issues that I was not able to fix within the rigid asyncore framework. Most notably, the often reported problem with frozen downloads that never closed is finally solved by putting a (configurable) timeout to all waiting states. Another annoying problem was the situation that transactions could be joined only after server response, with the result that simultaneously started downloads would never join force. All this is fixed. Text output should also be a lot clearer now that it is gathered per transaction and printed upon return; see the project screenshot for demo output. The old behaviour is still available as a --debug mode, which also prints extra information about current waiting states. However much has changed on the inside, on the outside the new replicator should be a continuation of previous versions. Some features have has gone missing - most notably cache browsing - but all of those I intend to restore in later alpha releases. After that I will switch to beta releases and stop adding big things until the server is thoroughly tested and released as 4.0. Needless to say you are highly encouraged to help me test this software and report your findings on this list. The unit-test script that is included in the package should be a good starting point - I actually expect most problems found this way to be bugs in the script, but these should be fixed just as well. Any further questions, problems, wishes, etc, please send them to this list so that others can benefit and I don't have to make the same replies over and over again ;P And a very happy 2008 to all of you! Gertjan |
From: Gertjan v. Z. <ger...@gm...> - 2007-12-24 01:33:16
|
Merry Christmas to all of you! Today I dug op the code once again and -- I do realize how familiar this sounds -- I am *very much* determined to make a few significant steps this time. My aim is to wrap up the features that are currently in SVN and release it as a limited but otherwise usable version. Today alreay I fixed a major outstanding bug in the fiber module, and I must say I am very pleased with its current shape and operation. The coming days I will set up some unit tests, go over the remaining modules and fix what is there to fix. As soon as I have a limited feature version that is free of obvious bugs I will release it as a first alpha, and move on to creating a website and writing documentation. So far the plan. Additional features are most likely going to have to wait for a next holiday, although from a well documented working state I expect it will be easier to invest occasional evening hours. Time will tell. My first priority now is releasing that alpha. Expect to hear more from me soon :) Gertjan |
From: Gertjan v. Z. <ger...@gm...> - 2007-07-07 00:27:29
|
Hello world! Here is a message from replicator HQ to the shining new http-replicator-users list, mainly to see if it works. If you're reading this then I guess that means it does. I also send a CC to all the people that have expressed some interest in replicator in the past so don't be surprised to receive this when you didn't register - yet. It's been a while since I posted my last news entry on the website, "3.1 is in the works, stay tuned!" or something; I can't check since the old website was taken off line after people started promoting medicine on it - and the admin did not really seem to object. Actually, I did, only I seemed to have misplaced my login and found myself locked out of my own website, unfortunately, unlike most of the rest of the world. Then again, can't say I really mind. I had long grown tired of Zope and I'm quite relieved to be back at the static html that I can trust. Or so I hope. The new host [1] is Sourceforge, I'm very pleased to say, which brings some additional nice features such as svn version control [2] and this mailing list - something I long thought would be a very nice thing to have. The past years I have received quite a lot of mails from people reporting bugs, requesting features, or simply informing about the state of the project and how exactly is 3.1 coming along? And as much as enjoyed receiving those mails, I think it would have saved a lot of people the effort if I could have just replied that "no, replicator is not dead, only I'm so busy lately that (...) but surely very soon now (...)" etcetera to /world like I do now. So.. what is the state of the project? It's looking good, really. After 3.0I decided to rewrite from scratch because the current code is ugly (beyond repair) and I could not fit all the new features in its rigid frame. So I did. And then when my Master's thesis and later PhD started draining the time out of my life the thing just stood there gathering dust. That did annoy me a lot, because I'm actually rather proud of the new design and I would very much like to bring it out in the open. So last week I decided to dig up the code and resume work, though slowly, towards what will be version 4.0. Here are some of the new features to expect: - FTP support - bandwidth shaping - IPv6 capable - download resuming (server side) All this has already worked, I seem to recall, only I'm not too confident about the current state of things. I will go over it in the coming weeks (promise). Currently at least simply HTTP caching seems to be functional. If you're curious of replicator's new look (still text based, don't worry) then download a snapshot from svn and try it out. Just, for now, please don't send any bug reports because I know there will be many. I will need some time to go through all code and fix the really big things. Meanwhile, from now on, this list will be my main communication channel so if you're interested in this project I suggest you subscribe [3] to stay posted. And that's all for now; I hope to have more news soon. Meanwhile, comments / feedback / words of encouragement are most appreciated, as always. GJ 1. http://sourceforge.net/projects/http-replicator 2. http://http-replicator.svn.sourceforge.net/viewvc/http-replicator 3. https://lists.sourceforge.net/lists/listinfo/http-replicator-users |