a few observations of my own, no to be taken seriously:
1) Block level deduplication
There are already a lot of filesystem/filesystem layers in fuse
(such as ZFS, lessfs, ...) which do this. This is often more
efficient then rolling an own solution and is well abstracted.
In my opinion it does not make sense to do block level
deduplication in the application layer, except if you do it on the
client side to safe bandwidth.
I would suggest not abusing the file system as database and using
something like SQLite. This gives you features like transactions,
atomic operations, etc. and also improves speed.
Is v4 published somewhere? What you are doing seems to be more
like a fork, if there are huge changes in v4 and you are working
Am 07.08.2012 10:36, schrieb Wessel Dankers:
On 2012‒08‒06 13:05:53-0500, Les Mikesell wrote:
On Mon, Aug 6, 2012 at 9:46 AM, Wessel Dankers
The ideas overlap to a limited extent with the ideas that Craig posted
to this list. For instance, no more hardlinks, and garbage collection is
done using flat-file databases. Some things are quite different. I'll try
to explain my ideas here.
Personally I think the hardlink scheme works pretty well up to about
the scale that I'd want on a single machine and you get a badly needed
atomic operation with links more or less for free.
Adding a chunk can be done atomically using a simple rename(). Removing a
chunk can be done atomically using unlink(). The only danger lies in
removing a chunk that is still being used (because there's a backup still
in progress whose chunks aren't being counted yet by the gc procedure). The
simplest way to prevent that is to grant exclusive access to the gc
process. Note that hardlinks do not prevent this race either. It's a
problem we need to solve anyway.
Craig lists a couple of reasons to abandon the hardlink scheme in
If you are going to do things differently, wouldn't it make sense to use
one of the naturally distributed scalable databases (bigcouch, riak,
etc.) for storage from the start since anything you do is going to
involve re-inventing the atomic operation of updating a link or replacing
it and the big win would be making this permit concurrent writes from
Using something like ceph/rados for storing the chunks could be interesting
at some point but I'd like to keep things simple for now.
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
BackupPC-devel mailing list