From: Wilson, S. M <st...@pu...> - 2017-09-13 18:25:28
|
Yes, most POSIX-based distributed filesystems like MooseFS will have trouble dealing with millions of small files. I like MooseFS enough that I'm willing to put up with the performance issue. I just thought that perhaps there were some things I could do to boost our performance a little. Steve ________________________________________ From: Davies Liu <dav...@gm...> Sent: Wednesday, September 13, 2017 2:11 PM To: Wilson, Steven M Cc: Aleksander Wieliczko; moo...@li... Subject: Re: [MooseFS-Users] Dealing with millions of files In general, MooseFS is not good for many small files (both for scalability and performance), other KV store (which pack small files into bigger chunks) will works better than MFS. On Wed, Sep 13, 2017 at 10:43 AM, Wilson, Steven M <st...@pu...> wrote: > Hi! > > > I used the timing of a tar/zip job just as an example of the poor > performance I'm seeing with small files. I agree that there are faster ways > to do that! > > > The ping times between a typical client and the MooseFS chunkservers and > master average aound 0.2 ms. > > > I ran the "big file" / "small files" copy test against on the MooseFS > filesystem with these results: > > Big file copy: 67.65 MB/s > > Small files copy (cp -r): 1.57 MB/s > > > Just for comparison purposes, I did the same test on another MooseFS > filesystem with the following results: > > Big file copy: 62.05 MB/s > > Small files copy (cp -r): 4.18 MB/s > > This filesystem is much less busy, only hosts ~12 million files, and neither > of its two chunkservers is undergoing an internal rebalance. But the timing > for the small files copy still seems quite slow. > > > I'm curious if someone could run a similar test on their own MooseFS > filesystem to see what kind of timings they get. > > > A few weeks ago, I tried setting the CPU governor on the master server to > "performance" instead of "powersave" but that showed no measurable > improvement. Just for good measure, I again set it to "performance" and > re-ran the copy tests and the result for the small files copy was 1.44 MB/s > (i.e., no significant difference). > > > Thanks for your help! > > > Regards, > > > Steve > > > > ________________________________ > From: Aleksander Wieliczko <ale...@mo...> > Sent: Wednesday, September 13, 2017 3:34 AM > To: Wilson, Steven M; Matt Welland; moo...@li... > Subject: Re: [MooseFS-Users] Dealing with millions of files > > Hi. > I would like to suggest to do what Davies Liu said. I mean use some kind of > parallel tool. > > What is the ping time between MooseFS client and other MooseFS components? > For small file operations most crucial is latency between all MooseFS > components - this is how TCP/IP protocol works. > > Also please check MooseFS master CPU power governor. > Maybe your CPU is not working with full speed. Command like lscpu will show > you current CPU MHz value. > > By the way, very simple TCP/IP exercise for example with NFS share: > Please try to copy one big 1GiB file and 10486 small 100KiB files. What > speed you will get? > > Best regards > Alex. > > On 12.09.2017 21:42, Wilson, Steven M wrote: > > The chunk servers have at least 64GB of memory in each one. There is some > swapping taking place but I set the LOCK_MEMORY option in mfschunkserver.cfg > to avoid swapping out the chunk server process. A quick check shows that > the most memory being used by any mfschunkserver is about 55% on one of the > 64GB chunk servers. > > > The ls and cat (both to /dev/null) seem fairly responsive: > > ls takes 0.169s on a directory of 57K files > > cat takes 0.363s on a 308KB file > > > It looks like the write speed for small files may be what's killing me. I > tried copying 900 small files (308KB) to another directory on the same > MooseFS filesystem and it took 273 seconds or about 1.1MB/s. Ouch! > > > Three of the four chunk servers are doing an internal rebalance of chunks. > Perhaps that's having a larger impact on my overall I/O performance than I > expected. > > > Steve > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://secure-web.cisco.com/1S06T52hVf88uyiHikVZYaqgLwEIyjT7upWrklGLvgrXVzQDd9YE-ovcZ3xAtqenzyBB8drDPIB7S57ZRpx8MaCyNcJviGwg7ArinC3P3DGPCEac3BNDmW06oZWlPrkvB28exIjIiP62Ic_RRfONbFtHqZASTLXxHh2Onjcp_Gb5b7xtuONu9Dc_7Ni9u9cn-arnAEpYpXPrVa68YDJs4Lv2BcLj72E6-r6xYV2s0ouuFqa1vI9Yohr3goFe0YnCRvPwc4UXs1jiRD3--dvm48Dly7nPhHQDBCqjutvRQgFrOGWd7t4T5a_7aKxY4Hi8eR-vOsNZXfhCuLnEupWpX7GivwJajIDQJwaUKO3HRx34/http%3A%2F%2Fsdm.link%2Fslashdot > _________________________________________ > moosefs-users mailing list > moo...@li... > https://secure-web.cisco.com/1kPxxUtLgVMLEHHWbUDrirwc6pLK7eaLx8SfzJR_vROzMQE-NZuQlYqpFOejRpbpC9UME7ZuGy4t8Wf7OTFzIHcO48vpX5gpcpqLmc0H5BAiCEedwVrnaDmzVS5vdUAHE4CMLCh6NVcdipFHlCu2Wmu-3OFiQ-dBb6fClyp-7Kowq6by-Ue-Nz_AlRwEXcAyTVpG4I8cCERDjSdiAG3UZvdo-nLJXqfkNDZUg7mNaWcQkj6QzFCHOmpTC811DJu1fg7iQgHFVI7PdSIe7rW6jt67bD5r96TdponsOLnfhx75k1ZjtO0PdlxnCJpYE0Rp64qGYNjz8-T0ALhnSntAgPC4UQTQofVD5t7TXyG3zgTs/https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fmoosefs-users > -- - Davies |