Re: [MooseFS-Users] Understanding moose's performance caps

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Jacob,

First of all, I would like to encourage you to join the discussions 
section on our github page:
https://github.com/moosefs/moosefs/discussions

This list isn't active anymore, everybody seem to have moved to github 
nowadays ;) Performance discussions happen there on the regular.

Second of all, some basics: MooseFS is a network file system, designed 
to handle large amounts of data and large number of parallel operations 
- this is where its strength lays. You will not get better or even 
comparable performance to a fast hard drive if you test a setup of 
single master, single chunkserver, single mfsclient and a single 
thread/process performing i/o. When you read or write data to a hard 
drive, you send your requests to the kernel and the kernel performs the 
operations on your hard drive. When you read or write data on MooseFS, 
you send the requests to the kernel, which forwards them to the FUSE* 
module, which communicates with MFS mount, which talks via network first 
with the master, to find out where the data is, then with chunkservers 
to actually handle the reads and writes. Latency is crucial here, but 
even with a very small one, the network communication limits the i/o 
speed. So single thread i/o speed is limited.
On the other hand a single drive has limited read and write speed, and 
if it is a mechanical drive, it doesn't do well with parallel operations 
at all. MooseFS client will run many parallel operations at the same 
time and while each is burdened by the network overhead, each one of 
them will be as fast as that single one (up to a point, of course, but 
depending on your hardware, you can run really a lot of them without 
performance drop). Plus, you can mount several clients on one machine 
and run many machines with clients. The scalability of your total i/o on 
the system is almost indefinite.

*FUSE has a hard limit of 4GB/sec when it comes to i/o speed, this is 
because FUSE uses a single communication channel (single pipe) to talk 
to kernel. So this is also a hard limit of a single mfs client if you 
don't use the mfsio library to communicate. But you can run many clients 
on one machine.

You run your test with "--numjobs=1". Try to run many thereads in 
parallel and see the performance then :) Run several instances of this 
test (or any other test) at the same time on one client, on many 
clients. You will start to see the power in the system.

The reason why your single threaded i/o is faster when the client is on 
master server rather than on the chunkserver is also easy to explain: 
the overhead for one network connection is more or less constant. With 
master the client talks in small packets - those are control 
information, about updating metadata information (change file length, 
file properties like atime, mtime etc.), about the location of the data 
itself. To the chunkserver the client sends the actual file data - 
larger packets. So if your client has lower latency when it talks to the 
master (and this will be true if they are on the same machine), the 
overall performance of your single threaded i/o will be better. But 
again - this is not what MooseFS is for. MooseFS is for many threads, 
many parallel operations.

Regards,

Agata

W dniu 25.10.2021 o 20:46, Jacob Dietz via moosefs-users pisze:
>
> Short update.
>
> Further testing with two additional clints makes my latency theory 
> unlikely.
>
> Third client (same hw as master) shows pretty much the same 
> performance pattern as the master.
>
> Fourth client (same hw as chunk server0) shows pretty much the same 
> performance pattern as the chunk server.
>
>   * Read remains some global bottleneck.
>   * Write seems to be hardware related.
>
> Best
>
> Jacob
>
> *From: *Jacob Dietz <jac...@ci...>
> *Date: *Monday, 25. October 2021 at 20:09
> *To: *Jacob Dietz via moosefs-users <moo...@li...>
> *Subject: *Understanding moose's performance caps
>
> Hi everyone,
>
> I’m testing a moosefs setup and am trying to understand the 
> performance values I get and the possible bottlenecks. Tried out quite 
> some things but am a little stuck.
>
> Status quo for test setup is:
>
> Mfs master server
>
> -10G Interface
>
> -Mfsmaster config is default
>
> Mfs chunk server
>
> -100G Interface
>
> -8x Sata SSD Raid
>
> -Mfschunkserver config default
>
> To rule out any client or additional network issues I’m testing on the 
> chunk server.
>
> Testing with fio gives me a stable 500MB/s read and 700MB/s write.
>
> (“sudo fio --filename=/mnt/mfs_mount/123 --direct=1 --rw=read --bs=64k 
> --ioengine=libaio --iodepth=64 --runtime=30 --numjobs=1 --time_based 
> --group_reporting --name=throughput-test-job --eta-newline=1 --size=50m”)
>
> Same test on the xfs  gives me about 8GB/s read and 5GB/s write.
>
> Utilization of the ssd array is zero during testing, so everything 
> seems to be handled in cache as fio probably deletes everything instantly.
>
> Testings:
>
> Seeing the xfs performance reserve and the idling array we tried to 
> get rid of the cache with “mfsmount /mnt/mfs_mount/ -H mfsmaster -o 
> mfscachemode=DIRECT”, which gave us the same results.
>
> Trying to increase the cache instead with -o mfsreadaheadsize=2048 -o 
> mfsreadaheadleng=2048576 also gave no significant difference.
>
> Upgrading the nice level of mfsmount to 0 or even 2 also didn’t change 
> performance.
>
> Trying to increase workers on chunkserver config didn’t change performance
>
> WORKERS_MAX = 500
>
> WORKERS_MAX_IDLE = 80
>
> Also tried reducing CHUNKS_LOOP_MIN_TIME = 150 on mfsmaster config but 
> still no change.
>
> Throughout the tests I couldn’t see any cpu cores capping.
>
> Also Tried to run via a second network connection (same 100GB/10GB) 
> without jumbo frames to rule out any issues on that side.
>
> Doing the mfsmount on the master server gave me pretty accurately the 
> same read performance. Write was strangely doubled to 1,5GB/s, which 
> is interesting as it only has a 10G interface.
>
> Guesses:
>
>                 I’m pretty new to moosefs and still trying to wrap my 
> head around it but to me, this seems like some cap I’m running against 
> as it’s so steady and reproduceable.
>
> Shouldn’t be the cache, as ram speed cap wouldn’t make sense.
>
> Shouldn’t be the ssd array.
>
> Shouldn’t be cpu as thereads are far from capping.
>
> The increased write on the master server indicates that it could be 
> the latency between the two servers. Read is similar on both machines 
> as they need to communicate either way. Write is increased on master 
> beyond his nic capabilities, possibly because he’s only committing the 
> writes to himself as fio deletes the data before it is even sent 
> outside. Array idling during all tests is backing this theory.
>
> That said, ping is between 0.3 and 0.2 ms.
>
> Sorry for the long post I hope it’s still readable.
>
> Would be great if anyone could point me the way to understand the 
> bottleneck(s) I’m facing and how to overcome it. Could latency be the 
> right path?
>
> Thanks!
>
> Best
>
> Jacob
>
>
>
> _________________________________________
> moosefs-users mailing list
> moo...@li...
> https://lists.sourceforge.net/lists/listinfo/moosefs-users

-- 
--
Agata Kruszona-Zawadzka
MooseFS Team

Re: [MooseFS-Users] Understanding moose's performance caps

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

Re: [MooseFS-Users] Understanding moose's performance caps