On the web site you mentioned massive parallel operation. I don’t know if you are familiar with the application development that’s been going on lately using graphics processing units, but it’s pretty interesting. NVidia has a downloadable development kit (http://www.nvidia.com/object/cuda_home.html#) that works with their graphics cards. I don’t have any idea if this would be usable for your application, but it seems like it might be a way to get great performance from relatively inexpensive hardware.
The laptop that I use has a nvidia card. I'll check the development kit to see if it's usable. It's certainly fun that this is even possible.
The thing I meant with massive parallel operation is however something different. lessfs can easily be extended to use tokyotyrant
A setup that distributes it's data to a large number of servers can this way be created. It provides a way to scale I/O performance over a large set of servers with minimal code change.
Now that you've mentioned Tokyo Tyrant, I'm wondering if you could extend the lessfs design to incorporate replication some day?
Currently we use Data Domain's dedupe product, and it's main thing over lessfs is replication.
I'm planning on helping with this project soon, in the area of documentation, I'm not sure I'd be much help coding :P
I am currently working on building snapshot support in lessfs. This will allow me to build support for replication. If your databases are not to big you can umount lessfs and use rsync to copy the data to another location for the time being.
I am very curious about Data Domain performance when compared to lessfs. If happen to have numbers on that I would love to see them.
Snapshots, great stuff! Do you have a list of features you have on the horizon? I'd love to help map that out for people to see.
My office mate is working on the DD project, and I'll see about getting some numbers, although I'm not sure how relevant they'll be since my testing is on totally different HW.
The rsync idea is a simple one, but if you move it to the DB idea, then replication might be more real-time. DD isn't real-time, but it's close, also it's viewed as a Backup device, so you're rsync idea might work.
I can't say we'll implement lessfs ever, but I love what you're doing, it's pretty 'groundbreaking'.
Also, do you have any idea how much overhead FUSE is?