Following up on my own post, I had a little free time the other day and decided to investigate whether this was feasible. Setting up the necessary services on Amazon was trivial, including access control and block storage. I tried s3fs first, and it worked, but it felt like there was way too much i/o going on for that kind of data (which is pretty much what I expected). Then I tried putting my bacula-sd on an EC2 node, writing to files on EBS, and it worked great (spooling first to the "local" drive on EC2). Throughput however was somewhat less than I was hoping for, approx. 25% of what I get locally to spool and then to tape. However, I found that there was NO performance penalty for running two jobs concurrently. I didn't try larger numbers, but my guess is you can run a large number of concurrent jobs to get a pretty good effective throughput, assuming you have lots of clients with similar data sizes.
Our problem is that 80% of our data is on one client, and it would take 130 hours to do a full backup, and our backup window simply isn't that long. Then I thought I could break the FileSets into smaller pieces and run multiple backup jobs in parallel (and I'm assuming that my client is not the bottleneck). However, it wouldn't run more than one job on that client concurrently. Since I can run multiple clients concurrently, I'm pretty sure my bacula-dir.conf and bacula-sd.conf settings are correct, and my bacula-fd.conf specifies "Maximum Concurrent Jobs = 20"... Any other reason why I couldn't run say 5 parallel jobs with different filesets off the same client?
From: Peter Zenge [mailto:pzenge@...]
Sent: Tuesday, March 02, 2010 2:57 PM
Subject: [Bacula-users] Bacula to the Cloud
Hello, 2 year Bacula user but first-time poster. I'm currently dumping about 1.6TB to LTO2 tapes every week and I'm looking to migrate to a new storage medium.
The obvious answer, I think, is a direct-attached disk array (which I would be able to put in a remote gigabit-attached datacenter before too long). However, I'm wondering if anyone is currently doing large (or what seem to me to be large) backups to the cloud in some way? Assuming I have a gigabit connection to the Internet from my datacenter, I'm wondering how feasible it would be to either use something like Amazon S3 with s3fs (I'm guessing way too much overhead to be efficient), or a bacula-SD implementation on an EC2 node, using Elastic Block Store (EBS) as "local" disk, and VPN (Amazon VPC) between my datacenter and the SD.
Substitute your favorite cloud provider for Amazon above; I don't use any right now so not tied to any particular provider. It just seems like Amazon has all the necessary pieces today.
To do this, and keep customers comfortable with the idea of data in the cloud, we would need to encrypt, so I'm also wondering if it would be possible for the SD to encrypt the backup volume, rather than the FD encrypt the data before sending it to SD (which is what we do now)? Easier to manage if we just handled encryption in one place for all clients.
I would love to hear what other people are either doing with Bacula and the cloud, or why you have decided not to.
Pzenge .at. ilinc .dot. com