[Moosefs-users] writeworker: connection was timed out

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hello,
while deploying new moosefs instance (with the latest 1.6.20 and running a 
mongodb on it) I have stumbled upon situation where mfsmount hangs and all 
file operations halt.

Both the mfsmount debug (-o debug) and normal log contains endless loop of:

Jul 11 16:43:21 proc78 mfsmount[2066]: file: 44, index: 0, chunk: 359, 
version: 1 - writeworker: connection with (D5AF4BB6:9422) was timed out 
(unfinished writes: 2; try counter: 1)
Jul 11 16:43:23 proc78 mfsmount[2066]: file: 44, index: 0, chunk: 359, 
version: 1 - writeworker: connection with (D5AF4BB6:9422) was timed out 
(unfinished writes: 2; try counter: 1)
Jul 11 16:43:25 proc78 mfsmount[2066]: file: 44, index: 0, chunk: 359, 
version: 1 - writeworker: connection with (D5AF4BB4:9422) was timed out 
(unfinished writes: 2; try counter: 1)
Jul 11 16:43:27 proc78 mfsmount[2066]: file: 44, index: 0, chunk: 359, 
version: 1 - writeworker: connection with (D5AF4BB6:9422) was timed out 
(unfinished writes: 2; try counter: 1)

And the kernel log:

Jul 11 15:33:16 proc78 kernel: [428520.580157] INFO: task mongod:8847 
blocked for more than 120 seconds.
Jul 11 15:33:16 proc78 kernel: [428520.580159] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 11 15:33:16 proc78 kernel: [428520.580160] mongod          D 
0000000000000002     0  8847      1 0x00000000
Jul 11 15:33:16 proc78 kernel: [428520.580164]  ffff8802223ffe38 
0000000000000082 0000000000000007 ffffffff81074bd8
Jul 11 15:33:16 proc78 kernel: [428520.580168]  ffff8802223fffd8 
ffff8802223fffd8 ffff8802223fffd8 0000000000012b00
Jul 11 15:33:16 proc78 kernel: [428520.580172]  ffff88011207a2c0 
ffff880221fbe0c0 0000000000000246 ffff880221005c80
Jul 11 15:33:16 proc78 kernel: [428520.580176] Call Trace:
Jul 11 15:33:16 proc78 kernel: [428520.580183]  [<ffffffffa0239d95>] 
fuse_set_nowrite+0x95/0xd0 [fuse]
Jul 11 15:33:16 proc78 kernel: [428520.580200]  [<ffffffffa023d1fa>] 
fuse_fsync_common+0xca/0x1a0 [fuse]
Jul 11 15:33:16 proc78 kernel: [428520.580217]  [<ffffffff8116ee0a>] 
vfs_fsync_range+0x5a/0xa0
Jul 11 15:33:16 proc78 kernel: [428520.580222]  [<ffffffff8111a913>] 
sys_msync+0x153/0x1e0
Jul 11 15:33:16 proc78 kernel: [428520.580227]  [<ffffffff81512292>] 
system_call_fastpath+0x16/0x1b
Jul 11 15:33:16 proc78 kernel: [428520.580231]  [<00007f415f730a7d>] 
0x7f415f730a7c

The cgi interface/mfsmaster doesn't report any problems with any servers or 
chunks. Checking all of the chunkserver logs (total 5 with goal 2) doesn't 
reveal anything besides them happily testing the local chunks.

I have seen some past threads also the FAQ entry that this is a harmless 
message but the problem in my case is that the mountpoint really hangs 
making the file system commands (like df / ls) to freeze up and only way is 
to forcibly kill the mfsmount process and remount.

Is there anything else I can do to solve the hang-up or any extra steps to 
identify/debug the cause?

wbr
rr

[Moosefs-users] writeworker: connection was timed out

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

[Moosefs-users] writeworker: connection was timed out