From: Agata Kruszona-Z. <ch...@mo...> - 2021-10-26 11:37:04
|
Hi Jacob, First of all, I would like to encourage you to join the discussions section on our github page: https://github.com/moosefs/moosefs/discussions This list isn't active anymore, everybody seem to have moved to github nowadays ;) Performance discussions happen there on the regular. Second of all, some basics: MooseFS is a network file system, designed to handle large amounts of data and large number of parallel operations - this is where its strength lays. You will not get better or even comparable performance to a fast hard drive if you test a setup of single master, single chunkserver, single mfsclient and a single thread/process performing i/o. When you read or write data to a hard drive, you send your requests to the kernel and the kernel performs the operations on your hard drive. When you read or write data on MooseFS, you send the requests to the kernel, which forwards them to the FUSE* module, which communicates with MFS mount, which talks via network first with the master, to find out where the data is, then with chunkservers to actually handle the reads and writes. Latency is crucial here, but even with a very small one, the network communication limits the i/o speed. So single thread i/o speed is limited. On the other hand a single drive has limited read and write speed, and if it is a mechanical drive, it doesn't do well with parallel operations at all. MooseFS client will run many parallel operations at the same time and while each is burdened by the network overhead, each one of them will be as fast as that single one (up to a point, of course, but depending on your hardware, you can run really a lot of them without performance drop). Plus, you can mount several clients on one machine and run many machines with clients. The scalability of your total i/o on the system is almost indefinite. *FUSE has a hard limit of 4GB/sec when it comes to i/o speed, this is because FUSE uses a single communication channel (single pipe) to talk to kernel. So this is also a hard limit of a single mfs client if you don't use the mfsio library to communicate. But you can run many clients on one machine. You run your test with "--numjobs=1". Try to run many thereads in parallel and see the performance then :) Run several instances of this test (or any other test) at the same time on one client, on many clients. You will start to see the power in the system. The reason why your single threaded i/o is faster when the client is on master server rather than on the chunkserver is also easy to explain: the overhead for one network connection is more or less constant. With master the client talks in small packets - those are control information, about updating metadata information (change file length, file properties like atime, mtime etc.), about the location of the data itself. To the chunkserver the client sends the actual file data - larger packets. So if your client has lower latency when it talks to the master (and this will be true if they are on the same machine), the overall performance of your single threaded i/o will be better. But again - this is not what MooseFS is for. MooseFS is for many threads, many parallel operations. Regards, Agata W dniu 25.10.2021 o 20:46, Jacob Dietz via moosefs-users pisze: > > Short update. > > Further testing with two additional clints makes my latency theory > unlikely. > > Third client (same hw as master) shows pretty much the same > performance pattern as the master. > > Fourth client (same hw as chunk server0) shows pretty much the same > performance pattern as the chunk server. > > * Read remains some global bottleneck. > * Write seems to be hardware related. > > Best > > Jacob > > *From: *Jacob Dietz <jac...@ci...> > *Date: *Monday, 25. October 2021 at 20:09 > *To: *Jacob Dietz via moosefs-users <moo...@li...> > *Subject: *Understanding moose's performance caps > > Hi everyone, > > I’m testing a moosefs setup and am trying to understand the > performance values I get and the possible bottlenecks. Tried out quite > some things but am a little stuck. > > Status quo for test setup is: > > Mfs master server > > -10G Interface > > -Mfsmaster config is default > > Mfs chunk server > > -100G Interface > > -8x Sata SSD Raid > > -Mfschunkserver config default > > To rule out any client or additional network issues I’m testing on the > chunk server. > > Testing with fio gives me a stable 500MB/s read and 700MB/s write. > > (“sudo fio --filename=/mnt/mfs_mount/123 --direct=1 --rw=read --bs=64k > --ioengine=libaio --iodepth=64 --runtime=30 --numjobs=1 --time_based > --group_reporting --name=throughput-test-job --eta-newline=1 --size=50m”) > > Same test on the xfs gives me about 8GB/s read and 5GB/s write. > > Utilization of the ssd array is zero during testing, so everything > seems to be handled in cache as fio probably deletes everything instantly. > > Testings: > > Seeing the xfs performance reserve and the idling array we tried to > get rid of the cache with “mfsmount /mnt/mfs_mount/ -H mfsmaster -o > mfscachemode=DIRECT”, which gave us the same results. > > Trying to increase the cache instead with -o mfsreadaheadsize=2048 -o > mfsreadaheadleng=2048576 also gave no significant difference. > > Upgrading the nice level of mfsmount to 0 or even 2 also didn’t change > performance. > > Trying to increase workers on chunkserver config didn’t change performance > > WORKERS_MAX = 500 > > WORKERS_MAX_IDLE = 80 > > Also tried reducing CHUNKS_LOOP_MIN_TIME = 150 on mfsmaster config but > still no change. > > Throughout the tests I couldn’t see any cpu cores capping. > > Also Tried to run via a second network connection (same 100GB/10GB) > without jumbo frames to rule out any issues on that side. > > Doing the mfsmount on the master server gave me pretty accurately the > same read performance. Write was strangely doubled to 1,5GB/s, which > is interesting as it only has a 10G interface. > > Guesses: > > I’m pretty new to moosefs and still trying to wrap my > head around it but to me, this seems like some cap I’m running against > as it’s so steady and reproduceable. > > Shouldn’t be the cache, as ram speed cap wouldn’t make sense. > > Shouldn’t be the ssd array. > > Shouldn’t be cpu as thereads are far from capping. > > The increased write on the master server indicates that it could be > the latency between the two servers. Read is similar on both machines > as they need to communicate either way. Write is increased on master > beyond his nic capabilities, possibly because he’s only committing the > writes to himself as fio deletes the data before it is even sent > outside. Array idling during all tests is backing this theory. > > That said, ping is between 0.3 and 0.2 ms. > > Sorry for the long post I hope it’s still readable. > > Would be great if anyone could point me the way to understand the > bottleneck(s) I’m facing and how to overcome it. Could latency be the > right path? > > Thanks! > > Best > > Jacob > > > > _________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users -- -- Agata Kruszona-Zawadzka MooseFS Team |