From: Michal B. <mic...@ge...> - 2011-02-08 08:59:31
|
Hi Jun and Thomas! We are aware of "non-persistent connections" and soon we'll improve this behavior. But unfortunately "persistent connections" would not help much in this scenario. "fdatasync" causes immediate dispatch of all data from cache to external disks - this is a "costly" operation (even if the connection would be sustained) and generally speaking eliminates all profits from having the cache. We can add to mfsmount an option like "mfsignorefsync" which would cause that "fdatasync" does nothing and data would be sent at "its own speed" and would use single connection for the whole group of data. If you'd be eager to experiment, let us know, we will prepare a special version with this option for tests. Regards -Michal From: Thomas S Hatch [mailto:tha...@gm...] Sent: Tuesday, February 08, 2011 12:32 AM To: Jun Cheol Park Cc: moosefs-users Subject: Re: [Moosefs-users] Why too slow in calling fdatasync()? (or futex()) On Mon, Feb 7, 2011 at 4:25 PM, Jun Cheol Park <jun...@gm...> wrote: Hi, I found a more specific case about the performance slowness issue that I experienced before. One of important commands in using KVM (kernel-based Virtual Machine) is qemu-img that generates a special form of file (i.e., qcow2) as follows. # strace qemu-img convert -f qcow2 -O qcow2 ........ read(4, "\f\0\0\0\0\0\0\0\0\0\0\0\260r\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 pwrite(5, "\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1"..., 512, 196608) = 512 fdatasync(5) = 0 futex(0x63eec4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x63eec0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 select(5, [4], [], NULL, NULL) = 1 (in [4]) read(4, "\f\0\0\0\0\0\0\0\0\0\0\0\260r\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 pwrite(5, "\200\0\0\0\0\27\0\0\200\0\0\0\0\30\0\0\200\0\0\0\0\31\0\0\200\0\0\0\0\32\0\ 0"..., 512, 263168) = 512 fdatasync(5) futex(0x63eec4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x63eec0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 The problem of this command (unlike 'cp' where no calling of fdatasync) is that, for every pwrite operation, fdatasync() and futex() are called. Then, the MFS write performance is significantly reduced from 60 M/s (with 'cp') to 1M/s (with qemu-img). I noticed via netstat with TIME_WAIT that, while the qemu-img command is running, there are a lot of non-persistent TCP connections. Is there any way to improve this situation? Thanks, -Jun ---------------------------------------------------------------------------- -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users Thanks Jun, that explains a lot, I usually prepare my qcow images on a build machine with local disks and then my datacenters running moosefs pull from the build machine over the internet. But I have never been able to get qemu-img convert to create a qcow image on a moosefs mount, this answers a lot of questions here, and it raises my curiosity about the connection handling. As I said before, I trust the MooseFS devs, but I wonder if I can ask, from an academic perspective, why has it been designed this way? and could this be an opportunity for performance improvement? -Thomas S Hatch |