Download Latest Version ubumirror-0.05.tar.bz2 (5.4 kB) Get Updates
Name Modified Size InfoDownloads / Week
ubumirror-0.05 2012-05-24
README.txt 2012-05-24 6.1 kB
Totals: 2 Items   6.1 kB 0
UBUMIRROR - an 'intelligent' APT repository mirror --------- Written by Jeff MacLoue <>, released under terms of GPL. ============== 0. Quick Start ============== !!! Please don't run this script as root, you may damage your system, your files or your karma if you do. To get a mirror of 32-bit Ubuntu Lucid Lynx which can be used for installation and regular updates, put something like that in your cron (note the shell expansion): /usr/local/bin/ -C lucid{,-updates,-security}/{main,restricted} This will make a partial mirror of under /var/www/htdocs/ubuntu/ so http://your-server/ubuntu/ can serve everything to the network installer. /var/spool/ubumirror/ directory will be used for temporary files in the process. More advanced example is: -u -C {lucid,natty}{,-updates,-security}/{main,restricted} Note the trailing slash in mirror URL, it is mandatory. This will mirror both Lucid Lynx and Natty Narwhal from Ukraine's local Ubuntu Archive. ============ 1. The Task ============ The good thing about APT repository is a package pool. A single copy of a package file is stored at every time - this reduces the distribution site size and makes "continuous release" process easier. The bad thing about APT repository is, well, a package pool. Mirroring an archive with several Ubuntu versions and all the universe/multiverse stuff can be a real pain if you are low (or, okay, greedy) on disk space or slow Internet connection. What's the point of doing an rsync of the entire archive if you have only Lucid Lynx in your office? What's the point to mirror all the multiverse swamp if you only need one or two packages there? So the task is rather simple: get only files you absolutely need to have. This sounds like an easy task - retrieve package index files for your distribution and repositories required, parse them for references to package pool, retrieve the packages. That's roughly all the script does, plus removing the package files not referenced in the indices. Of course there is little to no point to do this for one or two computers, it is perfectly possible to live a good life without local repository mirror at all. But, well, in a remote office with cheap ADSL and ten or fifteen Ubuntu workstations distribution updates and new workstation installations are complicated. A laptop with partial archive mirror is a real life-saver there. =================== 1a. Why ubumirror? =================== There is a ready-made (and maybe even official) solution, apt-mirror ( I tried it but it's not what I need - it tends to invoke multiple wget instances by default (which, er, is rather nice but I don't like it) and doesn't mirror .udeb files used for system installation. So tries to be simplier and smarter at the same time. To sum up, ubumirror is simple and targeted specifically at Ubuntu while apt-mirror is more advanced solution targeted at Debian. ============= 2. The Means ============= The script uses wget for all the network operations. There is little use for rsync as you need individual files and not the complete directories, and there is no point in doing massive network operations in Perl. Getopt::Std, File::Find, URI and IO::Zlib modules are used in operation. They may or may not come with your distribution (they do with the Slackware 13.37 I use, they are available in CentOS 5 as separate packages as well). cp from GNU coreutils is used for recursive file copy. It is in theory possible to port all this to a non-UNIX platform, patches are welcome. ======================== 3. Command-Line Options ======================== [-OPTIONS [-MORE_OPTIONS]] [--] REPOSITORY ... --help and --version standard options are recognized. The following single-character options are accepted: -a <arch> Architecture (default i386) -u <url> Base repository URL (default -s <path> Spool directory to store work files (default /var/spool/ubumirror) -d <path> Directory to store repository mirror (default /var/www/htdocs/ubuntu) -w <cmd> How to invoke WGET (default /usr/bin/wget) -C Don't do pool cleanup -v Verbose output -D Debug output -- Stop processing for options You need to specify at least one repository to proceed. The repositories are specified as <distribution>/<repo>, e.g., lucid/main. As mentioned above, it's best to use shell expansion capabilities to keep the command line shorter. ================= 4. The Operation ================= ubumirror starts with changing to the spool directory. Then it invokes wget to retrieve package index and Release/Release.gpg files which APT uses. For package indexes, Packages.gz is used as it is considerably smaller than uncompressed Packages and still readable with standard IO::Zlib Perl module. Then the retrieved Packages.gz are parsed to get all the Filename: and Size: line pairs. A hash %Pool is built from this information. Then the /var/www/htdocs/ubuntu/pool/ directory contents is compared with %Pool - if the package isn't there or has different size it is marked for retrieval. Then wget is invoked again to retrieve them. And finally after retrieving all the files the indices and other files from spool directory are copied over /var/www/htdocs/ubuntu/dists/ - to get a consistent mirror. Unlike apt-mirror, ubumirror is not very concerned with keeping the mirror consistent at every time for sake of simplicity. Optionally, if -C is specified on the command line, the /pool/ directory is scanned again and everything not referenced in the package files is deleted to save space. Please use with caution. ============================ 5. BUGS, TODOs, Suggestions ============================ Probably over 9000. This is an initial public release, quality should be considered alpha. Bug reports and patches can be submitted at SourceForge or sent to me at
Source: README.txt, updated 2012-05-24