From: <pen...@ic...> - 2015-08-28 08:22:15
|
Hi joe: Do the performance test results to share? Many thanks. pen...@ic... From: Joseph Love Date: 2015-08-28 05:19 To: moosefs-users Subject: Re: [MooseFS-Users] performance inquiry Hi, So, I did the following: - Moved the mfsmaster to one of the chunk servers (which has Xeons in it). - Installed ubuntu 14.04 on the 4th xeon server, which I was trying as a client in the latest tests. A single ‘dd’ instance on ubuntu 14.04 was able to write to the cluster at just under 500MB/s (494MB/s, 20GB from /dev/zero). I also ran (though it failed to generate the statistics) tiobench, which ran substantially faster, before generating a divide by zero error. I noticed something interesting and different when doing this. I’ve been running nload on the chunk servers, showing the bandwidth used by the chunk servers on my screen. It’s a little interesting how this varies between when the freebsd client is writing and when the ubuntu client is writing (even from just dd from /dev/zero). The freebsd client neither achieves the speeds the ubuntu client does, nor the consistency of sending data to all the chunk servers that the ubuntu client achieves. http://www.getsomewhere.net/nload_freebsd_client.png http://www.getsomewhere.net/nload_ubuntu_client.png -Joe On Aug 26, 2015, at 8:56 AM, Aleksander Wieliczko <ale...@mo...> wrote: Thank you for this information. We have two ideas to test in your environment: 1. Can you test mfsclient on Linux OS(like Ubuntu 14/Debian 8) with FUSE >= 2.9.3 ? 2. Can you switch from Atom to Xeon CPU for master during your tests? We are waiting for your feedback. Best regards Aleksander Wieliczko Technical Support Engineer MooseFS.com On 26.08.2015 15:23, Joseph Love wrote: Hi, Sure. All parts are running 3.0.39, on FreeBSD 10.2. Chunk servers are dual xeon X5560s, 24gb memory. Master is an atom C2758, 4gb memory Clients vary between an atom C2758 4gb memory (not the master), and a dual xeon L5650, 24gb memory. I’ve tried two different disk setups with the chunk servers: - with a pair of 1tb WD RE3s, mirrored; and - with a single Intel DC S3500 SSD. Goal is set to 1. NICs are Intel X520-DA2 (10gbe) with direct-attach SFP+ cables to a Broadcom 8000b switch. Latency from client 1 (atom C2758) - 30 packets: to Master: round-trip min/avg/max/stddev = 0.069/0.086/0.096/0.007 ms to chunk1: round-trip min/avg/max/stddev = 0.059/0.125/0.612/0.159 ms to chunk2: round-trip min/avg/max/stddev = 0.049/0.087/0.611/0.098 ms to chunk3: round-trip min/avg/max/stddev = 0.054/0.071/0.100/0.010 ms Latency from client 2 (xeon L5650) - 30 packets: to Master: round-trip min/avg/max/stddev = 0.045/0.056/0.073/0.006 ms to chunk1: round-trip min/avg/max/stddev = 0.029/0.036/0.043/0.003 ms to chunk2: round-trip min/avg/max/stddev = 0.033/0.037/0.044/0.003 ms to chunk3: There’s no traffic shaping/QOS on this LAN, and the master & chunk servers are all on a network dedicated just to them. I’ve also tried this from client2, partially out of curiosity: > dd if=/dev/zero of=test.zero.dd bs=1m count=10000 & && dd if=/dev/zero of=test.zero-2.dd bs=1m count=10000 & && dd if=/dev/zero of=test.zero-3.dd bs=1m count=10000 & 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 101.117837 secs (103698421 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 101.644066 secs (103161556 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 101.900907 secs (102901537 bytes/sec) Running 3 instances, I was able to write at 100MB/s per instance of dd. Which suggests being able to write at at least 300MB/s from a single client (just not in a single process/thread). I tried the same thing with 7 instances. I think it actually hit the limits on the chunk server’s disk speeds (approximately 220MB/s per ssd, iirc). 7 instances of dd gave me these results: 10485760000 bytes transferred in 124.690526 secs (84094280 bytes/sec) 10485760000 bytes transferred in 124.719954 secs (84074438 bytes/sec) 10485760000 bytes transferred in 126.579289 secs (82839460 bytes/sec) 10485760000 bytes transferred in 124.954106 secs (83916890 bytes/sec) 10485760000 bytes transferred in 126.181140 secs (83100850 bytes/sec) 10485760000 bytes transferred in 126.294929 secs (83025978 bytes/sec) 10485760000 bytes transferred in 103.810845 secs (101008329 bytes/sec) Add that all up, and it’s about 600MB/s, which is about 200MB/s/chunkserver - pretty close to the theoretical max for the SSDs. So, I guess a single client can reach the speeds, just not in a single process/thread. -Joe On Aug 26, 2015, at 1:17 AM, Aleksander Wieliczko <ale...@mo...> wrote: Hi. Can we get some more details about your configuration? - MooseFS master version? - MooseFS chunkserver version? - MooseFS client version? - Kernel version ? - CPU speed ? - RAM size ? - Number of disks per chunkserver? - What GOAL you set for test folder? - NIC interface type - Coper or Fiber? - Network latency from client to master and chunkservers(ping mfsmaster)? - Do you have some traffic shaping/QOS in you LAN? Best regards Aleksander Wieliczko Technical Support Engineer MooseFS.com On 25.08.2015 18:29, Joseph Love wrote: They’re all on 10gbe. I did turn on Jumbo frames during my testing, it didn’t seem to make a really big difference. I just tried with SSDs in the chunk servers (Intel DC S3500 200GB), and still seeing about the same performance characteristic. I know in the middle of some other synthetic tests that I can break 200MB/s (sequential) read, 150MB/s write, but that’s a multithreaded test application. Actually, now that I say that, I suppose it might be a single-thread performance characteristic with FUSE on FreeBSD. Anyone have statistic from a Linux system that shows > 100MB/s per thread on a client? -Joe On Aug 25, 2015, at 11:12 AM, Ricardo J. Barberis <ric...@do...> wrote: El Martes 25/08/2015, Joseph Love escribió: Hi, I’ve been doing some tests to get an idea as to what sort of speeds I can maybe expect from moosefs, and ran into something unexpected. From multiple clients, I can sustain 80-100MB/s per client (only tested up to 3 clients) to my 3-node cluster (3 chunk servers). From a single client (while everything else is idle) I get the same result. It occurred to me that the write speed to a disk in each chunk server is roughly 100MB/s, and I was curious if this seems to be the likely culprit for performance limitations for a single stream from a single client. I’m about to try it again with SSDs, but I have a bit of time before that’s ready, and I figured I’d try to pose the question early. Thoughts? -Joe How about network? If your clients are connected to 1 Gbps, 100 MB/s is nearly saturating the network. Also, using Jumbo frames might give you a few extra MB/s. Regards, -- Ricardo J. Barberis Senior SysAdmin / IT Architect DonWeb La Actitud Es Todo www.DonWeb.com _____ ------------------------------------------------------------------------------ _________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users ------------------------------------------------------------------------------ _________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users ------------------------------------------------------------------------------ _________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |