Menu

#32 simultaneous file and torrent copying

Undecided
open
nobody
None
1
2013-04-20
2013-04-19
Rob Bos
No

This is sort of half feature request, half support request. I'd like to distribute software to lab machines. This involves getting a multiple-gigabyte set of files (say, Adobe Creative Suite installer) from a network share onto a local cache folder on an individual machine.

Currently, we use robocopy, but that tends to time out after a while, and it puts a lot of load on the server. I've had some success using aria2c with torrents, and as long as there are enough machines in the swarm, it works well, but it degrades poorly. My best results have been to have each machine mount the target directory over SMB, then self-seed that mounted directory on localhost, and use a second aria2c to download to localhost (and incidentally, torrent to any other machine in the vicinity using a Web seed). This works pretty well for the most part, though I have no good way for instance of limiting it to a certain number of seeds to prevent overloading the central file server.

However, in the case where only one or a few machines needs to download the files, it sucks. It times out really easily and does not maximize the bandwidth available (because torrents are limited to one TCP stream).

So, I've been wondering, if it would be possible to tell aria2c to simultaneously do a file copy and a torrent, that is, can it simultaneously seed and copy a file off a network share. In my ideal world, I'd like to tell aria2c to get the file from both sources at once, save to a local copy, and share the file pieces to any peer belonging to the swarm.

So in a computer lab, if one computer downloads the files, it gets the file entirely with file copy. If a peer joins the swarm, then any pieces that computer A has downloaded will be shared with computer B, as well as B downloading the file. If there are (say) 20 seeds connected to the downloader already, it could rely entirely on torrent and ignore the file share.

Alternatively, it would work if aria2c could seed and download with multiple TCP streams to localhost. Right now, the self-seeding can only do one chunk at a time. If I could otherwise saturate the network bandwidth available to the single download case, that would also help.

If I'm missing something fundamental, feedback would be appreciated.

Discussion

  • Rob Bos

    Rob Bos - 2013-04-19

    minor edit, when I say "web seed", I mean "web tracker".

     
  • tujikawa

    tujikawa - 2013-04-19

    aria2 has a feature to download file from HTTP/FTP and BitTorrent swarms and the downloaded part from HTTP/FTP is uploaded to BT. This may help your situation.
    To setup this feature, we have 2 options. 1st one is include HTTP URIs in .torrent file.
    The specification is in http://bittorrent.org/beps/bep_0019.html I think you need torrent maker to support this, but btmakemetafile seems not to support this.
    2nd one is use Metalink file. Metalink is basically XML file and includes resource URLs and meta files like .torrent file to make download faster and reliable.
    In our case, include .torrent file and HTTP or FTP URLs in a Metalink file.
    You can see the example in Section 1.1 in Metalink RFC http://tools.ietf.org/html/rfc5854
    If all aria2c instance use this Metalink file, all of them try to download files from web server and it will certainly overload the server. To mitigate this, divide clients to 2 groups: one group has the Metalink file with HTTP/FTP URIs, and others without those URIs. Or maybe simply just white list certain IP addresses in web server.

    For tracker side, shortening min-interval (or interval) may help, because aria2 can contact tracker more frequently and find more peers, which were timed-out previously but
    they have now some data to upload.

    For one TCP stream issue, the workaround is launch several instances of aria2c in seeding machine. aria2 distinguishes clients with IP address and port pair, so with multiple aria2c seed instances, client can connect to more than one and use several TCP streams.

     
  • Rob Bos

    Rob Bos - 2013-04-19

    Some interesting ideas. I'd hoped to avoid using HTTP/FTP, because then I'd have to unzip the files on the local client (pretty CPU intensive) or do awkward things with recursive get, but maybe that'll work out reasonably well.

    I like the idea of running multiple local seeds on the client to improve bandwidth use. I may play around with that first.

    Also a good idea with min-interval.

    Thanks, this gives me some more things to experiment with.

     
  • tujikawa

    tujikawa - 2013-04-20

    I think you don't need to zip files. Since .torrent file is already created, you have a list of files in one "root" directory. So we just mount it in HTTP server and include URL in .torrent file with url-list key or in metalink file.

    Assuming you are using multi file torrent, and directory structures are follows:

    /path/to/rootdir/file1
    /path/to/rootdir/file2
    

    Here, "rootdir" is name key from info dictionary in .torrent file. Mount rootdir in HTTP server, like this:

    http://host/share/rootdir
    

    To embed this in .torrent file, add url-list key with the value http://host/share/ in outermost dictionay in .torrent file. See http://bittorrent.org/beps/bep_0019.html#multi-file-torrents

    For Metalink, create file element for each files like this:

    <?xml version="1.0" encoding="UTF-8"?>
    <metalink xmlns="urn:ietf:params:xml:ns:metalink">
      <file name="rootdir/file1">
    <url>http://host/share/rootdir/file1</url>
    <metaurl mediatype="torrent" name="rootdir/file1">http://host/share.torrent</metaurl>
      </file>
      <file name="rootdir/file2">
    <url>http://host/share/rootdir/file2</url>
    <metaurl mediatype="torrent" name="rootdir/file2">http://host/share.torrent</metaurl>
      </file>
    </metalink>
    

    share.torrent is a .torrent file without url-list

     
  • Rob Bos

    Rob Bos - 2013-04-20

    In the Adobe CS5.5 case, it's more like

    /path/to/rootdir/file1
    ...
    /path/to/rootdir/file6759

    so that could quickly get out of hand.

    But I will read up on it. :)

     

Log in to post a comment.