Again, it's premature, but I wanted to see if I threw a LOT of computer at lessfs, what would happen - Pretty interesting results. These are by no means something you could publish professionally, just me in my spare time.
HP DL380 G5, 32gb RAM, 2x4 CPU's (8 core), 3 x 72gb SAS Raid 0
Debian 5.0, Fuse 2.8-pre2, TC 1.4.23, lessfs v0.2, TC Cache 20gb
1a Clear linux cache
1. Raw: dd from partition 1 to local /tmp/image.tmp file bs=1MB count=512 (512mb file)
2. DD First: dd from partition 1 to /fuse/test.img bs=1MB count=512
3. DD Dedupe: dd from partition 1 to /fuse/test.img bs=1MB count=512 (again to get the dedup side of things).
Raw DD First DD Dedupe
defaults 60.25 196.75 265.50
direct_io 59.50 197.50 305.25
auto_cache 60.00 204.50 275.25
kernel_cache 59.50 189.00 291.00
These results are averages of 4 runs of each setting, and are listed in MB/s as reported by DD. The best results for the dedupe side are when direct_io is turned on, and that was consistent across the batch of other tests I did.
Turning off big_writes basically kills performance on every level, so w/o big_writes, you won't do better on performance at all.
If you have questions about anything I missed, please ask.
Anxious to see new versions to come out to test those :).
Keep up the great work!
Thanks for sharing your performance test.
The performance figures that you are reporting are similar to my own test results with a HP DL380G5. To really determine the actual write speed to disk you might want to include the sync command.
dd from partition 1 to /fuse/test.img bs=1MB count=512; time sync
One thing with lessfs v2.0 is that it includes a write cache
# CACHESIZE in MB
On a machine with a lot of memory you can play with the CACHESIZE to get optimal performance.
Another setting one can play with is:
A multicore machine with fast disks may need as many as 3 threads for the highest performance. A single core machine should not use more then one thread or the overhead of context switches will become painfully clear.
My experience is that for a realistic test it is wise to write a file to disk that is larger then the amount of memory in the machine. Even with normal filesystems the transfer speeds slow down when the file size approaches the size of the internal memory in the machine.
big_writes is always needed with lessfs. Normal writes with 4k blocks produce way to much overhead.
Other OS's like FreeBSD use larger blocksizes for disk I/O in general. FreeBSD defaults to 16K. For data deduplication 64K provides usually the best tradeoff between compression and speed.
Strangely enough Linux still uses a 4k blocksize for I/O. (The Itanium platform is an exception).
I'm glad you responded, I wanted to make sure I was on track.
I did include a cache, first of 20gb and then of 1gb, and did notice a slight change, obviously the more the better. I have the full testing figures on google docs if your interested.
I also played around with threads, as per my last post, they make a HUGE difference On my single proc laptop, I had it set to 2 and performance was terrible, setting it to 1 increased it by at least 10x. On the DL380, I tried 1, 2 and finally 8, and it ran amazing, again it makes a HUGE difference.
Another good point about the filesize. Mine was only 1gb, but I think I tried a 10gb file and the results were less. The block size I used was 128, but also used 64 with similar results. Without having big_writes on, I don't see much advantage to using lessfs at all, in fact the performance is worse than the regular file system.
I wanted to ask, do you compress and write in chunks of max_read/write size on the fly? You mention that lessfs will be good for virtualization file, why?
I also was curious how much FUSE plays a role in performance? Any chance of writting a VFS driver :).
I did do some quick testing from the laptop over the wire to the DL380 and our Data Domain dedupe device to see if they are comparable, but I can only use CIFS right now, since NFS doesn't work with lessfs (to my knowledge). Lessfs did about 2/3 the performance of DD for intitial and subsequent writes over CIFS. However, in our DD testing, we've found that thier CIFS driver is weak and provides horrible performance versus the NFS setup. If lessfs worked with NFS that would really step things up.
I know you're working hard on the next version, and thanks for putting in so much time on this. I'd love to help where I can, but don't have enough knowledge of Linux F/S to be of much help in the coding area. Docs, testing, etc I should be able to lend a hand. Just let me know.
You can email me directly john at compunique dot com if necessary, otherwise the forums work
I'll post to this forum for now because it might be of use to other people.
>I wanted to ask, do you compress and write in chunks of max_read/write size on the fly?
Lessfs does indeed compress data chunks of max_write size on the fly. This is also where the threads come into play. When the main thread is working on storing a block of data, other threads can already start working on compressing blocks of data.
QuickLZ allows compressions speeds up to 260 MB/sec. This is very fast but it still takes time. Extra threads for compressing the data will therefore enhance the throughput when the system has multiple CPU cores.
>You mention that lessfs will be good for virtualization file, why?
When you run multiple virtual machines with KVM/Xen or VMware and the machines are all clones from the same image lessfs will only store the image once. So if the clone image size is 10GB one would only need 10GB of space to run dozens of virtual machines.
If you use lessfs to backup virtual machines images you have the same advantage. Most of the data is shared between the virtuals and thus only stored once on lessfs. We already use lessfs for storing and archiving backup's from Equallogic ISCSI arrays and this really works great. The only problem that we have is that 1GB ethernet is to slow. ;-)
>I also was curious how much FUSE plays a role in performance? Any chance of writting a VFS driver :)
I don't think that FUSE introduces much performance overhead. Putting lessfs in the kernel will only introduce marginal performance gains. I have started work on 'blockless'. This kernel module adds a deduplicating blockdevice to Linux. This project is still in it's infancy. 'blockless' will allow any type of filesystem on top it. It will however take much more time to get blockless as feature rich as lessfs. Development in userspace is a bit easier. Especially for prototyping.
>If lessfs worked with NFS that would really step things up.
This is a fuse problem really. When fuse supports NFS then lessfs should to.
>Lessfs did about 2/3 the performance of DD for intitial and subsequent writes over CIFS.
The DD hardware probably had more then 3 disks installed? Tuning CIFS to use the same blocksize as lessfs is important to get decent performance. If samba writes with 4k blocks and lessfs uses 128k blocks then performance will suffer.
Again, thanks for testing. I really loved to hear how the throughput compares to the speed of commercial (DD) equipment.
Ok, so how do I set the CIFS block size? Is this during compile?
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.