[Moosefs-users] chunkserver goes down while mfsmount is already running

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi all,
When we did write test to MFS, I stopped all the chunkservers. After
restarted the chunkservers, I found "damaged" error in the CGI-page and
many "invalid copies" error form the master's log.

We have 4 chunkservers, and other 5 servers.
These 9 servers have mfsmount running and write data to mounted folder(I
have just changed the goal form 1 to 3 about 2 hours ago).
Before I stop the chunkservers, the inflow-rate of the mfs is about 40M
Bytes/sec.

1. Errors in logs
"chunk invalid copies" errors such as below from the mfsmaster log:
Mar 29 17:07:52 XXX-22 mfsmaster[7192]: chunk 00000000000EC3A8 has only
invalid copies (2) - please repair it manually
Mar 29 17:07:52 XXX-22 mfsmaster[7192]: chunk 00000000000EC3A8_00000002 -
invalid copy on (10.7.17.54 - ver:00000001)
Mar 29 17:07:52 XXX-22 mfsmaster[7192]: chunk 00000000000EC3A8_00000002 -
invalid copy on (10.7.17.86 - ver:00000000)
......
Mar 29 17:07:54 XXX-22 mfsmaster[7192]: chunk 00000000000EC3BF has only
invalid copies (1) - please repair it manually
Mar 29 17:07:54 XXX-22 mfsmaster[7192]: chunk 00000000000EC3BF_00000003 -
invalid copy on (10.7.17.54 - ver:00000001)
......

Besides that,  we also found some other errors form the chunkserver's log:
Mar 29 17:07:21 XXX-54 mfsmount[1883]: file: 170882, index: 31
-fs_writechunk returns status 8
...
Mar 29 17:07:43 XXX-85 mfschunkserver[26178]: write_block_to_chunk:
file:/data2/mfsdata/84/chunk_00000000000EC384_00000002.mfs - crc error
...
Mar 29 17:07:43 XXX-85 mfsmount[6604]: writeworker: write error: 29
......
Mar 29 17:07:44 XXX-85 mfsmount[6604]: writeworker: write error: 13
......
Mar 29 17:07:44 XXX-85 mfsmount[6604]: writeworker: write error: 28

The error number in the codes:
 "#define ERROR_CHUNKLOST        8 // Chunk lost"
 "#define ERROR_NOCHUNK         13 // No such chunk"
 "#define ERROR_DISCONNECTED    28 // Disconnected"
 "#define ERROR_CRC             29 // CRC error"

We got some more informations about the chunk form the chunkserver and
found that the error chunks have more than one copies but their versions
were not same.
eg: chunk-00000000000EC3A8:
The mfsmaster log:
Mar 29 22:42:01 XXX-22 mfsmaster[7192]: chunk 00000000000EC3A8 has only
invalid copies (2) - please repair it manually
Mar 29 22:42:01 XXX-22 mfsmaster[7192]: chunk 00000000000EC3A8_00000003 -
invalid copy on (10.7.17.86 - ver:00000002)
Mar 29 22:42:01 XXX-22 mfsmaster[7192]: chunk 00000000000EC3A8_00000003 -
invalid copy on (10.7.17.54 - ver:00000001)

The chunkserver' log:
Mar 29 17:07:43 XXX-85 mfschunkserver[26178]: write_block_to_chunk:
file:/data8/mfsdata/A8/chunk_00000000000EC3A8_00000002.mfs - crc error
Mar 29 17:31:24 XXX-85 mfschunkserver[15680]: write_block_to_chunk:
file:/data8/mfsdata/A8/chunk_00000000000EC3A8_00000003.mfs - crc error
Mar 29 17:07:43 XXX-86 mfschunkserver[8547]: write_block_to_chunk:
file:/data9/mfsdata/A8/chunk_00000000000EC3A8_00000002.mfs - crc error

The file in the chunkserver (54, 85 and 86):
54: 41096192 Mar 29 17:07 chunk_00000000000EC3A8_00000001.mfs
85: 41096192 Mar 29 17:31 chunk_00000000000EC3A8_00000003.mfs
86: 41096192 Mar 29 17:07 chunk_00000000000EC3A8_00000002.mfs

md5 value of the files:
7bd65382eb63db86d5b68395ae546f40
 /data3/mfsdata/A8/chunk_00000000000EC3A8_00000001.mfs
aa8f3bab55dfbf3f7a2dbd42993e4e51
 /data8/mfsdata/A8/chunk_00000000000EC3A8_00000003.mfs
9101e3feb0ecaea386afe0500df56941
 /data9/mfsdata/A8/chunk_00000000000EC3A8_00000002.mfs

In fact, this chunk is part of the file
"/mnt/mfs/test/p/20120329/0000000c/0000027e" :
 /mnt/mfs/test/p/20120329/0000000c/0000027e:
        chunk 0: 00000000000EC181_00000001 / (id:967041 ver:1)
                copy 1: 10.7.17.54:9422
                copy 2: 10.7.17.85:9422
                copy 3: 10.7.17.86:9422
        chunk 1: 00000000000EC1F3_00000001 / (id:967155 ver:1)
                copy 1: 10.7.17.54:9422
                copy 2: 10.7.17.55:9422
                copy 3: 10.7.17.86:9422
        ......
        chunk 6: 00000000000EC3A8_00000003 / (id:967592 ver:3)
                no valid copies !!!

When we use mfsfileinfo command , mfsmount will send a message
"MATOCU_FUSE_READ_CHUNK" to the master. If the chunk of the file is not
correct, the response from master will not contain the information we
suppose to get, and "no valid copies !!!" will be printed(such as chunk 6:
00000000000EC3A8_00000003).

2. Question
 1).
 Till now, I think the main cause of the "invalid copy" error is the
chunk-version conflict, am I right?
 But my doubt is that when will the chunk-version make changes. Thanks.
 Form the logs, we find many files which chunk's version is not 1, but 2, 3
or even 7.
eg:     chunk 0: 00000000000D5394_00000003 / (id:873364 ver:3)
                copy 1: 10.7.17.54:9422
                copy 2: 10.7.17.85:9422
                copy 3: 10.7.17.86:9422
        chunk 1: 00000000000D5505_00000003 / (id:873733 ver:3)
                copy 1: 10.7.17.55:9422
                copy 2: 10.7.17.85:9422
                copy 3: 10.7.17.86:9422
        chunk 2: 00000000000D55F3_00000007 / (id:873971 ver:7)
                copy 1: 10.7.17.54:9422
                copy 2: 10.7.17.55:9422
                copy 3: 10.7.17.86:9422
    ......

 2).
What will happen to the files awaiting to be saved when the chunkserver
goes down while mfsmount is already running? And when restart the
chunkserver, is there any influence on the saved files? (eg, when all the
chunkservers power off, maybe including the master)
According to "http://www.moosefs.org/moosefs-faq.html#master-online", when
the master server goes down while mfsmount is already running, mfsmount
doesn't disconnect the mounted resource and files awaiting to be saved
would stay quite long in the queue while trying to reconnect to the master
server.

 3).
As we know, if we want to stop one chunkserver or remove one HD of the
chunkserver, we have to do as "
http://www.moosefs.org/moosefs-faq.html#add_remove". It will be a long time
and many steps before we can remove the chunkserver or its disks, is there
any other better method?

We think we can set a "access-level" value to chunkserver, only when the
chunkserver's access-level is set to be "WRITE", we can write data to it,
otherwise the chunkserver is READ-ONLY. So after this has been implemented,
we could set the access-level of the chunkserver to be "READ-ONLY" when we
want to stop the chunkserver. But till now, we are not sure if this method
will work well, and we need do some more tests.

Do you have any ideas about this or you have some better solutions?  Thanks.

 4).
 When one disk of the chunkserver is marked "damaged" in the CGI monitor,
does it means that this disk is read-only?
 And what causes the chunkserver to be marked "damaged"?

 5).
In fact, we can find the error file according to the logs, such as
"/mnt/mfs/test/p/20120329/0000000c/0000027e" above.
After I try to repair this file with "mfsfilerepair", the version of the
chunk "00000000000EC3A8" changed to be 2, not 3. What the difference
between 00000000000EC3A8_00000002 and 00000000000EC3A8_00000003?
According to the MD5 value of these two files, their content are not same,
so is there any data lost after this mfsfilerepair operation?

#mfsfileinfo /mnt/mfs/test/p/20120329/0000000c/0000027e
/mnt/mfs/test/p/20120329/0000000c/0000027e:
chunk 0: 00000000000EC181_00000001 / (id:967041 ver:1)
copy 1: 10.7.17.54:9422
 copy 2: 10.7.17.85:9422
copy 3: 10.7.17.86:9422
 ......
chunk 6: 00000000000EC3A8_00000002 / (id:967592 ver:2)
copy 1: 10.7.17.86:9422

Thanks,

Best Wishes,
Wenhua

[Moosefs-users] chunkserver goes down while mfsmount is already running

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

[Moosefs-users] chunkserver goes down while mfsmount is already running