From: Andreas Fuchs <asf@bo...> - 2007-07-05 18:10:11
William Harold Newman wrote:
> I also want to avoid switching VCS very often. That is the main reason
> I haven't yet switched to something newer than CVS for my personal
> projects, where I do enough file renaming to get irritated with CVS on
> a regular basis; I am inclined to wait for it to be glaringly obvious
> which way I want to jump.
I agree, avoiding spurious VCS switching is a very good thing.
> Second, how do we hand off the high-bandwidth services to SourceForge
> (or whoever)? Or is that a nonissue? My rough guess is that it is
> indeed an issue. These days bandwidth and connectivity sufficient for
> dozens of committers to access a repository seem pretty cheap and on
> track to get cheaper, so I'm not too worried about any machine(s) used
> only by committers. But read access to the repository for the
> noncommitter world might still be something we'd want to hand off to a
> service like SourceForge. I don't know how to measure the rate of CVS
> reads today, but if it's at all comparable to downloads
> it seems like it might be a significant burden on a developer's
> machine. And because the download rate has roughly doubled in each of
> the past two years, and I don't foresee Moore's-Law-ish doubling times
> being as short as a year, it doesn't seem safe to rely on the burden
> growing lighter over time.
First off, I should admit that I don't have any bandwidth statistics
from git-daemon on my sbcl repository /-:
But I think I still can provide a bit of not-so-hard data and make a few
The entire SBCL history (which is copied when doing a git/cg clone) is
25MB in size. Cloning it via the git:// protocol takes 1:57 min on my
>From ssh, cloning the same amount of data takes 2:01 min on the same line.
So, 25MB (this number hasn't really changed in the 12+epsilon months the
CVS mirror has been running) per fresh checkout means that if 6000
people (the current peak on the sbcl tarball download graphs; far more
people than I think are checking out SBCL from CVS each month) were to
download it simultaneously, it would amount to 150 GB/month.
This isn't too bad if you consider that some hosting companies offer
dedicated servers with no traffic limit for around 49 EUR (I checked at
hetzner). As another data point, repo.or.cz offers to host any git
repository whose size doesn't exceed 100MB. If the data transfer cost
for new checkouts is too high, the sbcl repository for "anonymous"
checkouts could easily be moved there.
As for why I haven't included "cg update" in this, I think the cost for
it is negligible. A typical update for 1-3 months of data is an
estimated less than 2 MB in transfer size. Smaller changes are in the
dozens-of-kilobytes range. Unfortunately, I don't have any hard data on
this (transfer logs for git-daemon would really have helped...)
Anyway, given the presence of hosts like repo.or.cz (and the
availability of cheap unlimited bandwidth), I don't think bandwidth is a
big problem with git.
Andreas Fuchs, (http://|im:asf@|mailto:asf@)boinkor.net, antifuchs