Hi,
I was testing mfs with 16G RAM for master. After 16,000,000 files were
written into, I “mfschunkserver stop” a chunkserver for some time. Troubles
happened. The log of master server filled with plenty of messages as below:
---------------------------------------------------------------------------------------------------------------------
* currently unavailable file 158921: photos/47/74/82/r_163042.jpg
currently unavailable chunk 00000000004465CA (inode: 4353226 ; index: 0)
---------------------------------------------------------------------------------------------------------------------
It stays in the loop of “fs_test_files()”, refusing to go out to continue
serving for clients. The goal is 1, of cause.
I attempted to reconnect the chunk server to master, but it failed. The
master cannot provide the normal services for clients, and cannot accept the
packets from chunk server, even it is impossible to get information from
mfscgiserv. The mfs crashed. From the code, it decreases the value of
“allvalidcopies”, then print log messages as above in loop after entering
into “fs_test_files()”. Because the amount of files are so large, it lasts
for a very long time (maybe the most time is spent on “printf”).
I think it is quite common that the chunk server disconnected temporary, and
connected again. so maybe the fs_test_file() could be work in a separate
thread.
How do you concern about this problem?
--
Yours sincerely, Guowen Shen
<guo...@gm...>
|