From: Aleksander W. <ale...@mo...> - 2017-01-12 17:17:10
|
Hi, Would you be so kind and tell us something more about your „two node” cluster configuration. I mean what kind of CPU is on mfsmaster and chunkserver? Sepicfy amount of RAM on each server? How many hard disk you have inside you chunkserver and how fast they are? What is you LAN configuration: NIC devices, ping size between chunkserver, master and client. By the way I would like to add that minimum configration is: 1x MooseFS master server 3x MooseFS chunkserver. Best Regards Aleksander > On 12 sty 2017, at 15:50, ma...@me... wrote: > > W dniu 11.01.2017 o 17:54, ma...@me... pisze: >> Hi! >> I'm using moosefs-3.0.86 on ubuntu 16. This is two node setup. On second >> node - node without mfsmaster, only with mfschunkserver, I'm starting >> qemu VM (parameters for qemu: [...] -drive >> file=/var/lib/one//datastores/101/42/disk.0,format=qcow2,if=none,id=drive-virtio-disk0,cache=none >> -device >> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,logical_block_size=512,physical_block_size=4096 >> [...] ) >> >> When high I/O starts inside VM then qcow file with vm image becomes >> unreadable. OS inside qemu hangs. When I run md5sum on qcow file it also >> stuck in D state after reading some part of file. >> There is nothing suspicious in syslog. On mfsmaster I have: >>> Jan 11 16:57:27 w2t-cl-a-00 mfsmount[15828]: high packet travel time between client and master (0.248819s) - ignoring in time sync >>> Jan 11 17:00:00 w2t-cl-a-00 mfsmaster[2501]: no metaloggers connected !!! >>> Jan 11 17:00:03 w2t-cl-a-00 mfsmaster[2501]: child finished >>> Jan 11 17:00:03 w2t-cl-a-00 mfsmaster[2501]: store process has finished - store time: 3.750 >> >> [here, about 17:25 vm stuck in D state] > > In dmesg I've got: >> [ 3960.108053] INFO: task qemu-system-x86:25093 blocked for more than 120 seconds. >> [ 3960.108169] Not tainted 4.4.0-59-generic #80-Ubuntu >> [ 3960.108255] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [ 3960.108366] qemu-system-x86 D ffff880053a2fd38 0 25093 1 0x00000000 >> [ 3960.108373] ffff880053a2fd38 ffffffff81101066 ffff88003562bfc0 ffff88007916bfc0 >> [ 3960.108377] ffff880053a30000 ffff880078cfb3ac ffff88007916bfc0 00000000ffffffff >> [ 3960.108382] ffff880078cfb3b0 ffff880053a2fd50 ffffffff818343f5 ffff880078cfb3a8 >> [ 3960.108386] Call Trace: >> [ 3960.108396] [<ffffffff81101066>] ? futex_wait+0x206/0x280 >> [ 3960.108402] [<ffffffff818343f5>] schedule+0x35/0x80 >> [ 3960.108405] [<ffffffff8183469e>] schedule_preempt_disabled+0xe/0x10 >> [ 3960.108409] [<ffffffff818362d9>] __mutex_lock_slowpath+0xb9/0x130 >> [ 3960.108412] [<ffffffff8183636f>] mutex_lock+0x1f/0x30 >> [ 3960.108416] [<ffffffff8132335a>] fuse_file_write_iter+0x7a/0x2e0 >> [ 3960.108421] [<ffffffff8120e11b>] new_sync_write+0x9b/0xe0 >> [ 3960.108425] [<ffffffff8120e186>] __vfs_write+0x26/0x40 >> [ 3960.108428] [<ffffffff8120eb09>] vfs_write+0xa9/0x1a0 >> [ 3960.108432] [<ffffffff8120f975>] SyS_pwrite64+0x95/0xb0 >> [ 3960.108436] [<ffffffff818384f2>] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > > ------------------------------------------------------------------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |