Thread: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

Brought to you by: jakub_kruszona, moosefs, oxide94

moosefs-users

[Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: marco lu <mar...@gm...> - 2010-06-21 04:03:55

hi, everyone

We intend to use moosefs at our product environment as the storage of
our online photo service.

We'll store for about 400 million photo files. So the master server's
mem is a big problem.

I've built one master server(64G mem), one metalogger server, three
chunk servers(10*1T SATA). When I copy photo files to the moosefs
system. At start everything is good. But when the master server's
exhaust the memories.  I got many error syslog from master server:

Jun 21 11:48:58 mfs-master[4166]: currently unavailable chunk
00000000018140FF (inode: 26710547 ; index: 0)
Jun 21 11:48:58 mfs-master[4166]: * currently unavailable file
26710547: img.xxx.com/003/810/560/b.jpg
Jun 21 11:48:58 mfs-master[4166]: currently unavailable chunk
000000000144B907 (inode: 22516243 ; index: 0)
Jun 21 11:48:58 mfs-master[4166]: * currently unavailable file
22516243: img.xxx.com/051/383/419/a.jpg

and some error message like this:

Jun 21 11:49:31 mfs-master[4166]: chunkserver disconnected - ip:
10.10.10.11, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00
GiB)
Jun 21 11:50:03 mfs-master[4166]: CS(10.25.40.111) packet too long
(226064141/50000000)
Jun 21 11:50:03 mfs-master[4166]: chunkserver disconnected - ip:
10.10.10.12, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00
GiB)
Jun 21 11:50:34 mfs-master[4166]: CS(10.25.40.113) packet too long
(217185941/50000000)
Jun 21 11:50:34 mfs-master[4166]: chunkserver disconnected - ip:
10.10.10.13, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00
GiB)

It's a memory problem or a kernel tuning problem? Anyone can give me
some information?

Thans all.


Mumonitor

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Roast <zha...@gm...> - 2010-06-21 10:58:03

master server support cluster or metadata can be store at disk will be
a great feature for us.


On Mon, Jun 21, 2010 at 12:03 PM, marco lu <mar...@gm...> wrote:

> hi, everyone
>
> We intend to use moosefs at our product environment as the storage of our online photo service.
>
> We'll store for about 400 million photo files. So the master server's mem is a big problem.
>
> I've built one master server(64G mem), one metalogger server, three chunk servers(10*1T SATA). When I copy photo files to the moosefs system. At start everything is good. But when the master server's exhaust the memories.  I got many error syslog from master server:
>
>
> Jun 21 11:48:58 mfs-master[4166]: currently unavailable chunk 00000000018140FF (inode: 26710547 ; index: 0)
> Jun 21 11:48:58 mfs-master[4166]: * currently unavailable file 26710547: img.xxx.com/003/810/560/b.jpg
>
>
> Jun 21 11:48:58 mfs-master[4166]: currently unavailable chunk 000000000144B907 (inode: 22516243 ; index: 0)
> Jun 21 11:48:58 mfs-master[4166]: * currently unavailable file 22516243: img.xxx.com/051/383/419/a.jpg
>
>
> and some error message like this:
>
> Jun 21 11:49:31 mfs-master[4166]: chunkserver disconnected - ip: 10.10.10.11, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)
> Jun 21 11:50:03 mfs-master[4166]: CS(10.25.40.111) packet too long (226064141/50000000)
>
>
> Jun 21 11:50:03 mfs-master[4166]: chunkserver disconnected - ip: 10.10.10.12, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)
> Jun 21 11:50:34 mfs-master[4166]: CS(10.25.40.113) packet too long (217185941/50000000)
>
>
> Jun 21 11:50:34 mfs-master[4166]: chunkserver disconnected - ip: 10.10.10.13, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)
>
> It's a memory problem or a kernel tuning problem? Anyone can give me some information?
>
>
> Thans all.
>
>
> Mumonitor
>
>
>
> ------------------------------------------------------------------------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> lucky parental unit.  See the prize list and enter to win:
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> moosefs-users mailing list
> moo...@li...
> https://lists.sourceforge.net/lists/listinfo/moosefs-users
>
>


-- 
The time you enjoy wasting is not wasted time!

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Michał B. <mic...@ge...> - 2010-06-21 11:51:08

We give you here some quick patches you can implement to the master server
to improve its performance for that amount of files:

 

In matocsserv.c in mfsmaster you need to change this line:

#define MaxPacketSize 50000000

 

into this:

#define MaxPacketSize 500000000

 

 

 

Also we suggest a change in filesystem.c in mfsmaster in "fs_test_files"
function. Change this line:

if ((uint32_t)(main_time())<=starttime+150) {

 

into:

if ((uint32_t)(main_time())<=starttime+900) {

 

 

And also changing this line:

        for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) {

 

into this:

        for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) {

 

 

 

You need to recompile the master server and start it again. The above
changes should make the master server work more stable with large amount of
files.

 

 

Another suggestion would be to create two MooseFS instances (eg. 2 x 200
million files). One master server could also be metalogger for the another
system and vice versa.

 

 

Kind regards

Michał 

 

From: marco lu [mailto:mar...@gm...] 
Sent: Monday, June 21, 2010 6:04 AM
To: moo...@li...
Subject: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long
(226064141/50000000)

 

hi, everyone

We intend to use moosefs at our product environment as the storage of our
online photo service. 





We'll store for about 400 million photo files. So the master server's mem is
a big problem. 









I've built one master server(64G mem), one metalogger server, three chunk
servers(10*1T SATA). When I copy photo files to the moosefs system. At start
everything is good. But when the master server's exhaust the memories.  I
got many error syslog from master server:









Jun 21 11:48:58 mfs-master[4166]: currently unavailable chunk
00000000018140FF (inode: 26710547 ; index: 0)


Jun 21 11:48:58 mfs-master[4166]: * currently unavailable file 26710547:
img.xxx.com/003/810/560/b.jpg






Jun 21 11:48:58 mfs-master[4166]: currently unavailable chunk
000000000144B907 (inode: 22516243 ; index: 0)


Jun 21 11:48:58 mfs-master[4166]: * currently unavailable file 22516243:
img.xxx.com/051/383/419/a.jpg









and some error message like this:





Jun 21 11:49:31 mfs-master[4166]: chunkserver disconnected - ip:
10.10.10.11, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)


Jun 21 11:50:03 mfs-master[4166]: CS(10.25.40.111) packet too long
(226064141/50000000)






Jun 21 11:50:03 mfs-master[4166]: chunkserver disconnected - ip:
10.10.10.12, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)


Jun 21 11:50:34 mfs-master[4166]: CS(10.25.40.113) packet too long
(217185941/50000000)






Jun 21 11:50:34 mfs-master[4166]: chunkserver disconnected - ip:
10.10.10.13, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)





It's a memory problem or a kernel tuning problem? Anyone can give me some
information?









Thans all.








Mumonitor

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Fabien G. <fab...@gm...> - 2010-06-21 14:41:19

Hello,

We had exactly the same issue as Marco this morning (while copying lots of
files, it suddenly stopped working with the same error messages). The three
modifications in the source code provided by Michal + recompilation of
mfsmaster binary solved the problem, it's backup to life :-)

Notice that we "only" have 11'480'000 chunks (whereas Gemius seems to run a
26'000'000 chunks MFS cluster). Do you have any clue why it can happen,
whereas our current cluster is quite slam ?
Our configuration : one master server (8 GB of RAM), one master backup
server, 5 chunk servers (1 BG of RAM, 2 x 4 TB HDD on each chunkserver, and
about 2'200'000 chunks of each HDD, which means about 4'500'000 chunks
stored on each chunk server).

Regards,
Fabien



2010/6/21 Michał Borychowski <mic...@ge...>

>  We give you here some quick patches you can implement to the master
> server to improve its performance for that amount of files:
>
>
>
> In matocsserv.c in mfsmaster you need to change this line:
>
> #define MaxPacketSize 50000000
>
>
>
> into this:
>
> #define MaxPacketSize 500000000
>
>
>
>
>
>
>
> Also we suggest a change in filesystem.c in mfsmaster in "fs_test_files"
> function. Change this line:
>
> if ((uint32_t)(main_time())<=starttime+150) {
>
>
>
> into:
>
> if ((uint32_t)(main_time())<=starttime+900) {
>
>
>
>
>
> And also changing this line:
>
>         for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) {
>
>
>
> into this:
>
>         for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) {
>
>
>
>
>
>
>
> You need to recompile the master server and start it again. The above
> changes should make the master server work more stable with large amount of
> files.
>
>
>
>
>
> Another suggestion would be to create two MooseFS instances (eg. 2 x 200
> million files). One master server could also be metalogger for the another
> system and vice versa.
>
>
>
>
>
> Kind regards
>
> Michał
>
>
>
> *From:* marco lu [mailto:mar...@gm...]
> *Sent:* Monday, June 21, 2010 6:04 AM
> *To:* moo...@li...
> *Subject:* [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too
> long (226064141/50000000)
>
>
>
> hi, everyone
>
> We intend to use moosefs at our product environment as the storage of our online photo service.
>
>
> We'll store for about 400 million photo files. So the master server's mem is a big problem.
>
>
>
> I've built one master server(64G mem), one metalogger server, three chunk servers(10*1T SATA). When I copy photo files to the moosefs system. At start everything is good. But when the master server's exhaust the memories.  I got many error syslog from master server:
>
>
>
>
> Jun 21 11:48:58 mfs-master[4166]: currently unavailable chunk 00000000018140FF (inode: 26710547 ; index: 0)
>
> Jun 21 11:48:58 mfs-master[4166]: * currently unavailable file 26710547: img.xxx.com/003/810/560/b.jpg
>
> Jun 21 11:48:58 mfs-master[4166]: currently unavailable chunk 000000000144B907 (inode: 22516243 ; index: 0)
>
> Jun 21 11:48:58 mfs-master[4166]: * currently unavailable file 22516243: img.xxx.com/051/383/419/a.jpg
>
>
>
> and some error message like this:
>
>
> Jun 21 11:49:31 mfs-master[4166]: chunkserver disconnected - ip: 10.10.10.11, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)
>
> Jun 21 11:50:03 mfs-master[4166]: CS(10.25.40.111) packet too long (226064141/50000000)
>
> Jun 21 11:50:03 mfs-master[4166]: chunkserver disconnected - ip: 10.10.10.12, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)
>
> Jun 21 11:50:34 mfs-master[4166]: CS(10.25.40.113) packet too long (217185941/50000000)
>
> Jun 21 11:50:34 mfs-master[4166]: chunkserver disconnected - ip: 10.10.10.13, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)
>
>
> It's a memory problem or a kernel tuning problem? Anyone can give me some information?
>
>
>
> Thans all.
>
>
>
> Mumonitor
>
>
>
> ------------------------------------------------------------------------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> lucky parental unit.  See the prize list and enter to win:
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> moosefs-users mailing list
> moo...@li...
> https://lists.sourceforge.net/lists/listinfo/moosefs-users
>
>

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Michał B. <mic...@ge...> - 2010-06-23 09:09:34

Hi Fabien!

 

Probably important is the difference in the amount of chunks per one chunkserver. We have about 800,000 chunks per chunkserver (60 million chunks on 75 machines). 

 

How many files do you have? What is the average size of a file? What goal do you have set?

 

 

Regards

Michał 

 

 

From: Fabien Germain [mailto:fab...@gm...] 
Sent: Monday, June 21, 2010 4:41 PM
To: moo...@li...
Subject: Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

 

Hello,

We had exactly the same issue as Marco this morning (while copying lots of files, it suddenly stopped working with the same error messages). The three modifications in the source code provided by Michal + recompilation of mfsmaster binary solved the problem, it's backup to life :-)

Notice that we "only" have 11'480'000 chunks (whereas Gemius seems to run a 26'000'000 chunks MFS cluster). Do you have any clue why it can happen, whereas our current cluster is quite slam ?
Our configuration : one master server (8 GB of RAM), one master backup server, 5 chunk servers (1 BG of RAM, 2 x 4 TB HDD on each chunkserver, and about 2'200'000 chunks of each HDD, which means about 4'500'000 chunks stored on each chunk server).

Regards,
Fabien




2010/6/21 Michał Borychowski <mic...@ge...>

We give you here some quick patches you can implement to the master server to improve its performance for that amount of files:

 

In matocsserv.c in mfsmaster you need to change this line:

#define MaxPacketSize 50000000

 

into this:

#define MaxPacketSize 500000000

 

 

 

Also we suggest a change in filesystem.c in mfsmaster in "fs_test_files" function. Change this line:

if ((uint32_t)(main_time())<=starttime+150) {

 

into:

if ((uint32_t)(main_time())<=starttime+900) {

 

 

And also changing this line:

        for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) {

 

into this:

        for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) {

 

 

 

You need to recompile the master server and start it again. The above changes should make the master server work more stable with large amount of files.

 

 

Another suggestion would be to create two MooseFS instances (eg. 2 x 200 million files). One master server could also be metalogger for the another system and vice versa.

 

 

Kind regards

Michał 

 

From: marco lu [mailto:mar...@gm...] 
Sent: Monday, June 21, 2010 6:04 AM
To: moo...@li...
Subject: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

 

hi, everyone

We intend to use moosefs at our product environment as the storage of our online photo service. 













We'll store for about 400 million photo files. So the master server's mem is a big problem. 




















I've built one master server(64G mem), one metalogger server, three chunk servers(10*1T SATA). When I copy photo files to the moosefs system. At start everything is good. But when the master server's exhaust the memories.  I got many error syslog from master server:






 
 














Jun 21 11:48:58 mfs-master[4166]: currently unavailable chunk 00000000018140FF (inode: 26710547 ; index: 0)






Jun 21 11:48:58 mfs-master[4166]: * currently unavailable file 26710547: img.xxx.com/003/810/560/b.jpg













Jun 21 11:48:58 mfs-master[4166]: currently unavailable chunk 000000000144B907 (inode: 22516243 ; index: 0)






Jun 21 11:48:58 mfs-master[4166]: * currently unavailable file 22516243: img.xxx.com/051/383/419/a.jpg




















and some error message like this:













Jun 21 11:49:31 mfs-master[4166]: chunkserver disconnected - ip: 10.10.10.11, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)






Jun 21 11:50:03 mfs-master[4166]: CS(10.25.40.111) packet too long (226064141/50000000)













Jun 21 11:50:03 mfs-master[4166]: chunkserver disconnected - ip: 10.10.10.12, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)






Jun 21 11:50:34 mfs-master[4166]: CS(10.25.40.113) packet too long (217185941/50000000)













Jun 21 11:50:34 mfs-master[4166]: chunkserver disconnected - ip: 10.10.10.13, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)













It's a memory problem or a kernel tuning problem? Anyone can give me some information?




















Thans all.




















Mumonitor


------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
moosefs-users mailing list
moo...@li...
https://lists.sourceforge.net/lists/listinfo/moosefs-users

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: marco lu <mar...@gm...> - 2010-06-22 08:17:18

Thank Michał Borychowski !

This problem is resolved. The mfs system is restored too.

Another question is :
when i recompile mfsmaster as you said, mfscgiserv process cannot work
normally. this process  disappeared when i visit this url.  Without any
message (syslog or dmesg)  to debug this problem .

Thanks again.

Mumonitor

2010/6/21 Michał Borychowski <mic...@ge...>

>  We give you here some quick patches you can implement to the master
> server to improve its performance for that amount of files:
>
>
>
> In matocsserv.c in mfsmaster you need to change this line:
>
> #define MaxPacketSize 50000000
>
>
>
> into this:
>
> #define MaxPacketSize 500000000
>
>
>
>
>
>
>
> Also we suggest a change in filesystem.c in mfsmaster in "fs_test_files"
> function. Change this line:
>
> if ((uint32_t)(main_time())<=starttime+150) {
>
>
>
> into:
>
> if ((uint32_t)(main_time())<=starttime+900) {
>
>
>
>
>
> And also changing this line:
>
>         for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) {
>
>
>
> into this:
>
>         for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) {
>
>
>
>
>
>
>
> You need to recompile the master server and start it again. The above
> changes should make the master server work more stable with large amount of
> files.
>
>
>
>
>
> Another suggestion would be to create two MooseFS instances (eg. 2 x 200
> million files). One master server could also be metalogger for the another
> system and vice versa.
>
>
>
>
>
> Kind regards
>
> Michał
>
>
>
> *From:* marco lu [mailto:mar...@gm...]
> *Sent:* Monday, June 21, 2010 6:04 AM
> *To:* moo...@li...
> *Subject:* [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too
> long (226064141/50000000)
>
>
>
> hi, everyone
>
> We intend to use moosefs at our product environment as the storage of our online photo service.
>
>
> We'll store for about 400 million photo files. So the master server's mem is a big problem.
>
>
>
> I've built one master server(64G mem), one metalogger server, three chunk servers(10*1T SATA). When I copy photo files to the moosefs system. At start everything is good. But when the master server's exhaust the memories.  I got many error syslog from master server:
>
>
>
> Jun 21 11:48:58 mfs-master[4166]: currently unavailable chunk 00000000018140FF (inode: 26710547 ; index: 0)
>
> Jun 21 11:48:58 mfs-master[4166]: * currently unavailable file 26710547: img.xxx.com/003/810/560/b.jpg
>
> Jun 21 11:48:58 mfs-master[4166]: currently unavailable chunk 000000000144B907 (inode: 22516243 ; index: 0)
>
> Jun 21 11:48:58 mfs-master[4166]: * currently unavailable file 22516243: img.xxx.com/051/383/419/a.jpg
>
>
>
> and some error message like this:
>
>
> Jun 21 11:49:31 mfs-master[4166]: chunkserver disconnected - ip: 10.10.10.11, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)
>
> Jun 21 11:50:03 mfs-master[4166]: CS(10.25.40.111) packet too long (226064141/50000000)
>
> Jun 21 11:50:03 mfs-master[4166]: chunkserver disconnected - ip: 10.10.10.12, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)
>
> Jun 21 11:50:34 mfs-master[4166]: CS(10.25.40.113) packet too long (217185941/50000000)
>
> Jun 21 11:50:34 mfs-master[4166]: chunkserver disconnected - ip: 10.10.10.13, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)
>
>
> It's a memory problem or a kernel tuning problem? Anyone can give me some information?
>
>
>
> Thans all.
>
>
>
> Mumonitor
>
>

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Michał B. <mic...@ge...> - 2010-06-22 10:47:35

Mfscgiserv was not touched by the patches, we had made tests with the exact
patches and it worked properly. 

 

You can also try to run mfscgiserv with options -f and -v:

/usr/local/sbin/mfscgiserv -f -v

 

This way mfscgiserv would work in foreground and would write supported
requests like:

 

# /usr/local/sbin/mfscgiserv -f -v

starting simple cgi server (host: any , port: 9425 , rootpath:
/usr/local/share/mfscgi)

Asynchronous HTTP server running on port 9425

localhost - - [22/Jun/2010 11:14:11] "GET / HTTP/1.1" 301

localhost - - [22/Jun/2010 11:14:11] "GET /index.html HTTP/1.1" 200

localhost - - [22/Jun/2010 11:14:12] "GET /mfs.cgi HTTP/1.1" 200

localhost - - [22/Jun/2010 11:14:12] "GET /mfs.css HTTP/1.1" 200

localhost - - [22/Jun/2010 11:14:12] "GET /logomini.png HTTP/1.1" 200

 

This should give us more interesting information.

 

 

We were also wondering if you could test your environment of 400 million
files putting master swap file on an SSD hard drive?

 

 

Regards

Michał 

 

From: marco lu [mailto:mar...@gm...] 
Sent: Tuesday, June 22, 2010 10:17 AM
To: Michał Borychowski
Cc: moo...@li...
Subject: Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too
long (226064141/50000000)

 

Thank Michał Borychowski ! 

This problem is resolved. The mfs system is restored too. 

Another question is :
when i recompile mfsmaster as you said, mfscgiserv process cannot work
normally. this process  disappeared when i visit this url.  Without any
message (syslog or dmesg)  to debug this problem .

Thanks again.

Mumonitor

2010/6/21 Michał Borychowski <mic...@ge...>

We give you here some quick patches you can implement to the master server
to improve its performance for that amount of files:

 

In matocsserv.c in mfsmaster you need to change this line:

#define MaxPacketSize 50000000

 

into this:

#define MaxPacketSize 500000000

 

 

 

Also we suggest a change in filesystem.c in mfsmaster in "fs_test_files"
function. Change this line:

if ((uint32_t)(main_time())<=starttime+150) {

 

into:

if ((uint32_t)(main_time())<=starttime+900) {

 

 

And also changing this line:

        for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) {

 

into this:

        for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) {

 

 

 

You need to recompile the master server and start it again. The above
changes should make the master server work more stable with large amount of
files.

 

 

Another suggestion would be to create two MooseFS instances (eg. 2 x 200
million files). One master server could also be metalogger for the another
system and vice versa.

 

 

Kind regards

Michał 

 

From: marco lu [mailto:mar...@gm...] 
Sent: Monday, June 21, 2010 6:04 AM
To: moo...@li...
Subject: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long
(226064141/50000000)

 

hi, everyone

We intend to use moosefs at our product environment as the storage of our
online photo service. 













We'll store for about 400 million photo files. So the master server's mem is
a big problem. 




















I've built one master server(64G mem), one metalogger server, three chunk
servers(10*1T SATA). When I copy photo files to the moosefs system. At start
everything is good. But when the master server's exhaust the memories.  I
got many error syslog from master server:






 














Jun 21 11:48:58 mfs-master[4166]: currently unavailable chunk
00000000018140FF (inode: 26710547 ; index: 0)






Jun 21 11:48:58 mfs-master[4166]: * currently unavailable file 26710547:
img.xxx.com/003/810/560/b.jpg













Jun 21 11:48:58 mfs-master[4166]: currently unavailable chunk
000000000144B907 (inode: 22516243 ; index: 0)






Jun 21 11:48:58 mfs-master[4166]: * currently unavailable file 22516243:
img.xxx.com/051/383/419/a.jpg




















and some error message like this:













Jun 21 11:49:31 mfs-master[4166]: chunkserver disconnected - ip:
10.10.10.11, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)






Jun 21 11:50:03 mfs-master[4166]: CS(10.25.40.111) packet too long
(226064141/50000000)













Jun 21 11:50:03 mfs-master[4166]: chunkserver disconnected - ip:
10.10.10.12, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)






Jun 21 11:50:34 mfs-master[4166]: CS(10.25.40.113) packet too long
(217185941/50000000)













Jun 21 11:50:34 mfs-master[4166]: chunkserver disconnected - ip:
10.10.10.13, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)













It's a memory problem or a kernel tuning problem? Anyone can give me some
information?




















Thans all.




















Mumonitor

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Fabien G. <fab...@gm...> - 2010-06-23 15:10:48

Hi Michal and moosefs-users@,

2010/6/23 Michał Borychowski <mic...@ge...>

> Probably important is the difference in the amount of chunks per one
> chunkserver. We have about 800,000 chunks per chunkserver (60 million chunks
> on 75 machines).
>

Thanks for you quick answer. Yes you're right, that's what I thought too :
4.7M chunks per chunkserver is far bigger than 800K

Just a question : How much RAM do you use on the master, and on slaves ? In
our case (11'800'000 chunks on 5 chunkservers) :
* 'mfsmaster' process on master : 4.9 GB (64 bits recompilation required :
the 32 bits version of mfsmaster crashed without a message when it came to 4
GB)
* 'mfschunkserver' process on chunkservers : 580 MB


 How many files do you have? What is the average size of a file? What goal
> do you have set?
>

Our current test cluster is used for backups storage. We have lots of files
from all sizes (rsync of /etc, /home/, ... files on several servers), and
also a lot of big archive files (several GB each). We currently have 14.7
millions inodes used.

Maybe later, we'd like to use it for webhosting. But since LOCK is not
supported, it's not yet possible.


Fabien

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Michał B. <mic...@ge...> - 2010-06-29 07:23:59

Hi!

 

 

Just a question : How much RAM do you use on the master, and on slaves ? In our case (11'800'000 chunks on 5 chunkservers) :
* 'mfsmaster' process on master : 4.9 GB (64 bits recompilation required : the 32 bits version of mfsmaster crashed without a message when it came to 4 GB)
* 'mfschunkserver' process on chunkservers : 580 MB

[MB] 32bit machines are not capable of addressing more than 4GB, so that was quite a normal behaviour.

Regarding memory please have a look at this FAQ entry:

http://www.moosefs.org/moosefs-faq.html#cpu – there is information about the cpu loads and ram usage. Keep in mind that RAM depends on the total number of files and folders (not on their size) and CPU load in mfsmaster depends on amount of operations which take place in the filesystem.

[...]


Maybe later, we'd like to use it for webhosting. But since LOCK is not supported, it's not yet possible.

[MB] What for do you need LOCK for webhosting?





[MB] Kind regards

Michal

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Fabien G. <fab...@gm...> - 2010-07-08 23:41:09

Hi all,

2010/6/29 Michał Borychowski <mic...@ge...>

>   * 'mfsmaster' process on master : 4.9 GB (64 bits recompilation required
> : the 32 bits version of mfsmaster crashed without a message when it came to
> 4 GB)
> * 'mfschunkserver' process on chunkservers : 580 MB
>
> [MB] 32bit machines are not capable of addressing more than 4GB, so that
> was quite a normal behaviour.
>
> Regarding memory please have a look at this FAQ entry:
>
> http://www.moosefs.org/moosefs-faq.html#cpu – there is information about
> the cpu loads and ram usage. Keep in mind that RAM depends on the total
> number of files and folders (not on their size) and CPU load in mfsmaster
> depends on amount of operations which take place in the filesystem.
>

Yes I read that page (actually I read every pages of moosefs.org to really
understand how it works !). It's just a mistake I made to compile it on a 32
bits platform.
But maybe you could tell the dev team that in case of memory allocation
failure, mfsmaster crashes without a message... well, if we consider that
"segmentation fault" is not a real error message :-)



> Maybe later, we'd like to use it for webhosting. But since LOCK is not
> supported, it's not yet possible.
>
> [MB] What for do you need LOCK for webhosting?
>

Dynamic websites, writing information to files. Several web servers using
the same MooseFS could try to write to the same file in the same time.


Fabien

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Michał B. <mic...@ge...> - 2010-07-09 08:26:02

 

 

Yes I read that page (actually I read every pages of  <http://moosefs.org> moosefs.org to really understand how it works !). 

[MB] Perfect :)

 

It's just a mistake I made to compile it on a 32 bits platform.
But maybe you could tell the dev team that in case of memory allocation failure, mfsmaster crashes without a message... well, if we consider that "segmentation fault" is not a real error message :-)

[MB] It’s on our todo list (but to be honest, with low priority – one cannot expect a 32bit machine to work with more than 4GB RAM :))



 

Maybe later, we'd like to use it for webhosting. But since LOCK is not supported, it's not yet possible.

[MB] What for do you need LOCK for webhosting?


Dynamic websites, writing information to files. Several web servers using the same MooseFS could try to write to the same file in the same time.

 

[MB] It should not be a problem for you. 

 

There is a mechanism of chunk locking for write, but the writing process would be slow. There is no mechanism of informing the client waiting to write that the lock had been released (probably we’ll implement it one time). So now client which couldn’t start writing process will try again every second. This solution can in theory lead to starvation. But practically it shouldn’t.

 

So this is a safe operation but still is not recommended. It is better when different process on different machines write to different files and later some other system combine this data from many files into one target file (something like in “map-reduce” processing).

 

The only problem would be with simultaneous appending (writing at the end) to the same file by two clients.

 

Please also read a thread “Append and seek while writing functionality” on the group archive:

http://sourceforge.net/mailarchive/forum.php?forum_name=moosefs-users <http://sourceforge.net/mailarchive/forum.php?forum_name=moosefs-users&max_rows=25&style=ultimate&viewmonth=201006> &max_rows=25&style=ultimate&viewmonth=201006

 

 

Regards

Michał

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Fabien G. <fab...@gm...> - 2010-07-09 08:48:01

Hi,

2010/7/9 Michał Borychowski <mic...@ge...>

>  It's just a mistake I made to compile it on a 32 bits platform.
>
> But maybe you could tell the dev team that in case of memory allocation
> failure, mfsmaster crashes without a message... well, if we consider that
> "segmentation fault" is not a real error message :-)
>
> *[MB] It’s on our todo list (but to be honest, with low priority – one
> cannot expect a 32bit machine to work with more than 4GB RAM :))*
>

Sure ! But the problem is more general than 32bit machines : just catching
the "no more memory available" error would be great, since it can happen on
both 32bit and 64bit machines. For example, on our 64 bit machine with a 64
bit compiled mfsmaster binary, metadata has became such big that mfsmaster
crashed, and we can't even restore it since it takes too much memory, and
ends with a segmentation fault :

[root@mfsmaster ~]# mfsmetarestore -a -d /data/MFS/
loading objects (files,directories,etc.) ... ok
loading names ... ok
loading deletion timestamps ... ok
checking filesystem consistency ... ok
loading chunks data ... Segmentation fault
[root@mfsmaster ~]#

[root@mfsmaster ~]# strace mfsmetarestore -a -d /data/MFS/
  [...]
read(3,
"\0\0\0\0\36\347\314\0\0\0\1\0\0\0\0\0\0\0\0\0\35\347\314\0\0\0\1\0\0\0\0\0"...,
4096) = 4096
mmap2(NULL, 561152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= -1 ENOMEM (Cannot allocate memory)
brk(0xb08e8000)                         = 0xffffffffb0844000
mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1,
0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1,
0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1,
0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1,
0) = -1 ENOMEM (Cannot allocate memory)
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
[root@mfsmaster ~]#


   [MB] What for do you need LOCK for webhosting?
>
>
> Dynamic websites, writing information to files. Several web servers using
> the same MooseFS could try to write to the same file in the same time.
>
> *[MB] It should not be a problem for you.
> *
>
> *There is a mechanism of chunk locking for write, but the writing process
> would be slow. There is no mechanism of informing the client waiting to
> write that the lock had been released (probably we’ll implement it one
> time). So now client which couldn’t start writing process will try again
> every second. This solution can in theory lead to starvation. But
> practically it shouldn’t.*
>

Oh, ok ! I missed that part of the documentation, shame on me. Thank you for
the information.



>  * So this is a safe operation but still is not recommended. It is better
> when different process on different machines write to different files and
> later some other system combine this data from many files into one target
> file (something like in “map-reduce” processing).*
>

I totally agree with you Michal, but we make webhosting for thousands of
customers and most of them don't even know what a cluster is ;-)


Fabien

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Michał B. <mic...@ge...> - 2010-07-09 08:59:43

From: Fabien Germain [mailto:fab...@gm...] 
Sent: Friday, July 09, 2010 10:48 AM
To: Michał Borychowski
Cc: moo...@li...
Subject: Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

Hi,

2010/7/9 Micha Borychowski <mic...@ge...>

It's just a mistake I made to compile it on a 32 bits platform.

But maybe you could tell the dev team that in case of memory allocation failure, mfsmaster crashes without a message... well, if we consider that "segmentation fault" is not a real error message :-)

[MB] It’s on our todo list (but to be honest, with low priority – one cannot expect a 32bit machine to work with more than 4GB RAM :))

Sure ! But the problem is more general than 32bit machines : just catching the "no more memory available" error would be great, since it can happen on both 32bit and 64bit machines. For example, on our 64 bit machine with a 64 bit compiled mfsmaster binary, metadata has became such big that mfsmaster crashed, and we can't even restore it since it takes too much memory, and ends with a segmentation fault :

[root@mfsmaster ~]# mfsmetarestore -a -d /data/MFS/
loading objects (files,directories,etc.) ... ok 
loading names ... ok
loading deletion timestamps ... ok
checking filesystem consistency ... ok
loading chunks data ... Segmentation fault
[root@mfsmaster ~]#

[root@mfsmaster ~]# strace mfsmetarestore -a -d /data/MFS/
  [...]
read(3, "\0\0\0\0\36\347\314\0\0\0\1\0\0\0\0\0\0\0\0\0\35\347\314\0\0\0\1\0\0\0\0\0"..., 4096) = 4096
mmap2(NULL, 561152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0xb08e8000)                         = 0xffffffffb0844000
mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
[root@mfsmaster ~]# 

[MB] We’ll look into it

[...]

 So this is a safe operation but still is not recommended. It is better when different process on different machines write to different files and later some other system combine this data from many files into one target file (something like in “map-reduce” processing).

I totally agree with you Michal, but we make webhosting for thousands of customers and most of them don't even know what a cluster is ;-)

[MB] So what kind of simultaneous writing happens there mainly? Could you give us some examples?

Michal

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Stas O. <sta...@gm...> - 2010-07-09 12:43:02

Hi.

Sure ! But the problem is more general than 32bit machines : just catching
> the "no more memory available" error would be great, since it can happen on
> both 32bit and 64bit machines. For example, on our 64 bit machine with a 64
> bit compiled mfsmaster binary, metadata has became such big that mfsmaster
> crashed, and we can't even restore it since it takes too much memory, and
> ends with a segmentation fault :
>

Can you tell the total amount of files stored, and the total space stored
you have, that you hitting this issue?

Also, how much the metadata takes, and how much memory you have?

Regards.

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Stas O. <sta...@gm...> - 2010-07-09 12:47:42

Hi.

2010/6/21 Michał Borychowski <mic...@ge...>

>  We give you here some quick patches you can implement to the master
> server to improve its performance for that amount of files:
>
>
>
> In matocsserv.c in mfsmaster you need to change this line:
>
> #define MaxPacketSize 50000000
>
>
>
> into this:
>
> #define MaxPacketSize 500000000
>
>
>
>
>
>
>
> Also we suggest a change in filesystem.c in mfsmaster in "fs_test_files"
> function. Change this line:
>
> if ((uint32_t)(main_time())<=starttime+150) {
>
>
>
> into:
>
> if ((uint32_t)(main_time())<=starttime+900) {
>
>
>
>
>
> And also changing this line:
>
>         for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) {
>
>
>
> into this:
>
>         for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) {
>
>
>
>
>
>
>
> You need to recompile the master server and start it again. The above
> changes should make the master server work more stable with large amount of
> files.
>
>
>

Can these changes be added to next MFS release?
Or they impact the performance in any way for smaller amounts?

Regards.

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Michał B. <mic...@ge...> - 2010-07-12 07:33:19

Yes, probably these patches would be applied to the new version or we would implement a still better solution for registering large amounts of files.

 

 

Regards

Michal 

 

From: Stas Oskin [mailto:sta...@gm...] 
Sent: Friday, July 09, 2010 2:47 PM
To: Michał Borychowski
Cc: moo...@li...; marco lu
Subject: Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

 

Hi.

2010/6/21 Micha Borychowski <mic...@ge...>

We give you here some quick patches you can implement to the master server to improve its performance for that amount of files:

In matocsserv.c in mfsmaster you need to change this line:

#define MaxPacketSize 50000000

into this:

#define MaxPacketSize 500000000

Also we suggest a change in filesystem.c in mfsmaster in "fs_test_files" function. Change this line:

if ((uint32_t)(main_time())<=starttime+150) {

into:

if ((uint32_t)(main_time())<=starttime+900) {

And also changing this line:

for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) {

into this:

for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) {

You need to recompile the master server and start it again. The above changes should make the master server work more stable with large amount of files.

 


Can these changes be added to next MFS release?
Or they impact the performance in any way for smaller amounts?

Regards.

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Stas O. <sta...@gm...> - 2010-07-09 12:57:10

>
>  Sure ! But the problem is more general than 32bit machines : just catching
> the "no more memory available" error would be great, since it can happen on
> both 32bit and 64bit machines. For example, on our 64 bit machine with a 64
> bit compiled mfsmaster binary, metadata has became such big that mfsmaster
> crashed, and we can't even restore it since it takes too much memory, and
> ends with a segmentation fault :
>
> [root@mfsmaster ~]# mfsmetarestore -a -d /data/MFS/
> loading objects (files,directories,etc.) ... ok
> loading names ... ok
> loading deletion timestamps ... ok
> checking filesystem consistency ... ok
> loading chunks data ... Segmentation fault
> [root@mfsmaster ~]#
>
> [root@mfsmaster ~]# strace mfsmetarestore -a -d /data/MFS/
>   [...]
> read(3,
> "\0\0\0\0\36\347\314\0\0\0\1\0\0\0\0\0\0\0\0\0\35\347\314\0\0\0\1\0\0\0\0\0"...,
> 4096) = 4096
> mmap2(NULL, 561152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = -1 ENOMEM (Cannot allocate memory)
> brk(0xb08e8000)                         = 0xffffffffb0844000
> mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = -1 ENOMEM (Cannot allocate memory)
> mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
> -1, 0) = -1 ENOMEM (Cannot allocate memory)
> mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
> -1, 0) = -1 ENOMEM (Cannot allocate memory)
> mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
> -1, 0) = -1 ENOMEM (Cannot allocate memory)
> mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
> -1, 0) = -1 ENOMEM (Cannot allocate memory)
> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> +++ killed by SIGSEGV +++
> [root@mfsmaster ~]#
>
> *[MB] We’ll look into it*
>
>
>
Another suggestion:

Perhaps it's possible to measure the total available memory to MFS master /
logger, and show via chart how much is left?
Similar to how disk space is measured today per chunk servers.

That would allow to plan the memory expansion in advance, and not to be
pressed to locate and add more memory modules when the MFSg master / logger
has crashed (or even normally stopped once this added) due to insufficient
memory.

Regards.

Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too long (226064141/50000000)

From: Michał B. <mic...@ge...> - 2010-07-15 08:48:25

From: Stas Oskin [mailto:sta...@gm...] 
Sent: Friday, July 09, 2010 2:57 PM
To: Michał Borychowski
Cc: moo...@li...
Subject: Re: [Moosefs-users] mfs-master[4166]: CS(10.10.10.10) packet too
long (226064141/50000000)

Another suggestion:

Perhaps it's possible to measure the total available memory to MFS master /
logger, and show via chart how much is left?
Similar to how disk space is measured today per chunk servers.

That would allow to plan the memory expansion in advance, and not to be
pressed to locate and add more memory modules when the MFSg master / logger
has crashed (or even normally stopped once this added) due to insufficient
memory.

[MB] Probably we could quite easily check how much memory a given process
occupies. We have to see how all the supported operating system return this
value. But it would be much more difficult to check how much memory or swap
is still left. So yes, we can add to the CGI Monitor "RAM usage" information
for the master server, but still admin would have to tell if it is much or
not.

Regards

Michał

[Moosefs-users] mfs-master[10546]: CS(192.168.0.124) packet too long (115289537/50000000)

From: TianYuchuan(田玉川) <ti...@fo...> - 2010-08-07 18:23:40

hello,everyone!
I have a big quertion,please help me,thank you very much.
We intend to use moosefs at our product environment as the storage of our online photo service. 
We'll store for about 200 million photo files.  
I've built one master server(48G mem), one metalogger server, eight chunk servers(8*1T SATA). When I copy photo files to the moosefs system. At start everything is good. But  I had copyed files 57 million ，the master machines'CPU were used 100% .
I sthoped the master when used “/user/local/mfs/sbin/mfsmasterserver -s”，that I started the master。but there was a big  problem ，the master had not read my files。 These documents are important to me，I am very anxious，please help me recover these files，tihanks。
 
 I got many error syslog from master server:

Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 41991323: 2668/2526212449954462668/176s.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 00000000043CD358 (inode: 50379931 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 50379931: 2926/4294909215566102926/163b.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 00000000002966C3 (inode: 48284 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 48284: bookdata/178/8533354296639220178/180b.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 0000000000594726 (inode: 4242588 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 4242588: bookdata/6631/4300989258725036631/85s.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 0000000000993541 (inode: 8436892 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 8436892: bookdata/7534/3147352338521267534/122b.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 0000000000D906E6 (inode: 12631196 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 12631196: bookdata/8691/11879047433161548691/164s.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 000000000118DC1E (inode: 16825500 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 16825500: bookdata/1232/17850056326363351232/166b.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 0000000001681BC7 (inode: 21019804 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 21019804: bookdata/26/12779298489336140026/246s.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 0000000001A804E1 (inode: 25214108 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 25214108: bookdata/3886/8729781571075193886/30s.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 0000000001E7E826 (inode: 29408412 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 29408412: bookdata/4757/142868991575144757/316b.jpg

Aug  7 23:56:36 localhost mfsmaster[10546]: CS(192.168.0.124) packet too long (115289537/50000000)
Aug  7 23:56:36 localhost mfsmaster[10546]: chunkserver disconnected - ip: 192.168.0.124, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)
Aug  8 00:08:14 localhost mfsmaster[10546]: CS(192.168.0.127) packet too long (104113889/50000000)
Aug  8 00:08:14 localhost mfsmaster[10546]: chunkserver disconnected - ip: 192.168.0.127, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)
Aug  8 00:21:03 localhost mfsmaster[10546]: CS(192.168.0.120) packet too long (117046565/50000000)
Aug  8 00:21:03 localhost mfsmaster[10546]: chunkserver disconnected - ip: 192.168.0.120, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)

when I visited the mfscgi，the error  was“Can't connect to MFS master (IP:127.0.0.1 ; PORT:9421)”
。

Thanks all！

[Moosefs-users] mfs-master[10546]: CS(192.168.0.125) packet too long (115289537/50000000)

From: TianYuchuan(田玉川) <ti...@fo...> - 2010-08-08 14:52:21

 
hello,everyone!
I have a big quertion,please help me,thank you very much.
We intend to use moosefs at our product environment as the storage of our online photo service. 
We'll store for about 200 million photo files.  
I've built one master server(48G mem), one metalogger server, eight chunk servers(8*1T SATA). When I copy photo files to the moosefs system. At start everything is good. But  I had copyed files 57 million ，the master machines'CPU were used 100% .
I sthoped the master when used “/user/local/mfs/sbin/mfsmasterserver -s”，that I started the master。but there was a big  problem ，the master had not read my files。 These documents are important to me，I am very anxious，please help me recover these files，tihanks。
 
 I got many error syslog from master server:

Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 41991323: 2668/2526212449954462668/176s.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 00000000043CD358 (inode: 50379931 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 50379931: 2926/4294909215566102926/163b.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 00000000002966C3 (inode: 48284 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 48284: bookdata/178/8533354296639220178/180b.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 0000000000594726 (inode: 4242588 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 4242588: bookdata/6631/4300989258725036631/85s.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 0000000000993541 (inode: 8436892 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 8436892: bookdata/7534/3147352338521267534/122b.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 0000000000D906E6 (inode: 12631196 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 12631196: bookdata/8691/11879047433161548691/164s.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 000000000118DC1E (inode: 16825500 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 16825500: bookdata/1232/17850056326363351232/166b.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 0000000001681BC7 (inode: 21019804 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 21019804: bookdata/26/12779298489336140026/246s.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 0000000001A804E1 (inode: 25214108 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 25214108: bookdata/3886/8729781571075193886/30s.jpg
Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable chunk 0000000001E7E826 (inode: 29408412 ; index: 0)
Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable file 29408412: bookdata/4757/142868991575144757/316b.jpg

Aug  7 23:56:36 localhost mfsmaster[10546]: CS(192.168.0.124) packet too long (115289537/50000000)
Aug  7 23:56:36 localhost mfsmaster[10546]: chunkserver disconnected - ip: 192.168.0.124, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)
Aug  8 00:08:14 localhost mfsmaster[10546]: CS(192.168.0.127) packet too long (104113889/50000000)
Aug  8 00:08:14 localhost mfsmaster[10546]: chunkserver disconnected - ip: 192.168.0.127, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)
Aug  8 00:21:03 localhost mfsmaster[10546]: CS(192.168.0.120) packet too long (117046565/50000000)
Aug  8 00:21:03 localhost mfsmaster[10546]: chunkserver disconnected - ip: 192.168.0.120, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)

when I visited the mfscgi，the error  was“Can't connect to MFS master (IP:127.0.0.1 ; PORT:9421)”
。

Thanks all！

Re: [Moosefs-users] mfs-master[10546]: CS(192.168.0.125) packet too long (115289537/50000000)

From: Shen G. <sh...@ui...> - 2010-08-09 02:57:23

Don't worry!
This is because some of your chunk servers are currently unreachable,
and the master server notices it, then modifies the meta data of files
in those chunk servers to set the "allvalidcopies" to 0 in "struct
chunk". When the master is rescanning the files (fs_test_files() in
filesystem.c), it finds out the valid copy is 0, then print information
into syslog file, just as listed below. However, printing process is
quite time-consuming, especially the mount of files is large. During
this period, the master ignores the chunk server's connection (because
it is in a big loop of test files, and it is a single thread to do this,
maybe this is a pitfall). So although you make sure the chunk server
working correctly, it is useless (you can notice the reconnecting
information in chunk server's syslog file).
You could let the master finish printing, then it will reconnect with
chunk servers, and will notice the files is there, then set the
"allvalidcopies" to a correct value. Then works normally.
Or you can re-compile the program with commenting the line 5512 and line
5482 in filesystem.c(mfs-1.6.15). It will ignore the print messages and
of cause, reduce the fs test time.
Below is from Michal:
-----------------------------------------------------------------------
We give you here some quick patches you can implement to the master
server to improve its performance for that amount of files:

In matocsserv.c in mfsmaster you need to change this line:

#define MaxPacketSize 50000000
into this:
#define MaxPacketSize 500000000

Also we suggest a change in filesystem.c in mfsmaster in "fs_test_files"
function. Change this line:

if ((uint32_t)(main_time())<=starttime+150) {
into:
if ((uint32_t)(main_time())<=starttime+900) {

And also changing this line:
        for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) {
into this:
        for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) {

You need to recompile the master server and start it again. The above
changes should make the master server work more stable with large amount
of files.

Another suggestion would be to create two MooseFS instances (eg. 2 x 200
million files). One master server could also be metalogger for the
another system and vice versa.


Kind regards

Michał 
-----------------------------------------------------------------------------

--
Guowen Shen

On Sun, 2010-08-08 at 22:51 +0800, TianYuchuan(田玉川) wrote:
> 
>  
> hello,everyone!
> I have a big quertion,please help me,thank you very much.
> We intend to use moosefs at our product environment as the storage of
> our online photo service. 
> We'll store for about 200 million photo files.  
> I've built one master server(48G mem), one metalogger server, eight
> chunk servers(8*1T SATA). When I copy photo files to the moosefs
> system. At start everything is good. But  I had copyed files 57
> million ，the master machines'CPU were used 100% 
> I sthoped the master when used “/user/local/mfs/sbin/mfsmasterserver
> -s”，that I started the master。but there was a big  problem ，the
> master had not read my files。 These documents are important to me，I
> am very anxious，please help me recover these files，tihanks。
>  
>  I got many error syslog from master server:
> 
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 41991323: 2668/2526212449954462668/176s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 00000000043CD358 (inode: 50379931 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 50379931: 2926/4294909215566102926/163b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 00000000002966C3 (inode: 48284 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 48284: bookdata/178/8533354296639220178/180b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000594726 (inode: 4242588 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 4242588: bookdata/6631/4300989258725036631/85s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000993541 (inode: 8436892 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 8436892: bookdata/7534/3147352338521267534/122b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000D906E6 (inode: 12631196 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 12631196: bookdata/8691/11879047433161548691/164s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 000000000118DC1E (inode: 16825500 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 16825500: bookdata/1232/17850056326363351232/166b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001681BC7 (inode: 21019804 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 21019804: bookdata/26/12779298489336140026/246s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001A804E1 (inode: 25214108 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 25214108: bookdata/3886/8729781571075193886/30s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001E7E826 (inode: 29408412 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 29408412: bookdata/4757/142868991575144757/316b.jpg
> 
> 
> Aug  7 23:56:36 localhost mfsmaster[10546]: CS(192.168.0.124) packet
> too long (115289537/50000000)
> Aug  7 23:56:36 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.124, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> Aug  8 00:08:14 localhost mfsmaster[10546]: CS(192.168.0.127) packet
> too long (104113889/50000000)
> Aug  8 00:08:14 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.127, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> Aug  8 00:21:03 localhost mfsmaster[10546]: CS(192.168.0.120) packet
> too long (117046565/50000000)
> Aug  8 00:21:03 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.120, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> 
> when I visited the mfscgi，the error  was“Can't connect to MFS master
> (IP:127.0.0.1 ; PORT:9421)”
> 。
> 
> Thanks all！ 
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by 
> 
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev 
> _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users

Re: [Moosefs-users] mfs-master[10546]: CS(192.168.0.125) packet too long (115289537/50000000)

From: Michał B. <mic...@ge...> - 2010-08-09 13:15:50

Shen, thanks for the reply :)

Tian, these limits have been changed in 1.6.16 and now the latest stable is 1.6.17 so we would recommend you just update the master server to 1.6.17.


If you need any further assistance please let us know.

Kind regards
Michał Borychowski 
MooseFS Support Manager
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Gemius S.A.
ul. Wołoska 7, 02-672 Warszawa
Budynek MARS, klatka D
Tel.: +4822 874-41-00
Fax : +4822 874-41-01


-----Original Message-----
From: Shen Guowen [mailto:sh...@ui...] 
Sent: Monday, August 09, 2010 4:42 AM
To: TianYuchuan(田玉川)
Cc: moo...@li...
Subject: Re: [Moosefs-users] mfs-master[10546]: CS(192.168.0.125) packet too long (115289537/50000000)

Don't worry!
This is because some of your chunk servers are currently unreachable,
and the master server notices it, then modifies the meta data of files
in those chunk servers to set the "allvalidcopies" to 0 in "struct
chunk". When the master is rescanning the files (fs_test_files() in
filesystem.c), it finds out the valid copy is 0, then print information
into syslog file, just as listed below. However, printing process is
quite time-consuming, especially the mount of files is large. During
this period, the master ignores the chunk server's connection (because
it is in a big loop of test files, and it is a single thread to do this,
maybe this is a pitfall). So although you make sure the chunk server
working correctly, it is useless (you can notice the reconnecting
information in chunk server's syslog file).
You could let the master finish printing, then it will reconnect with
chunk servers, and will notice the files is there, then set the
"allvalidcopies" to a correct value. Then works normally.
Or you can re-compile the program with commenting the line 5512 and line
5482 in filesystem.c(mfs-1.6.15). It will ignore the print messages and
of cause, reduce the fs test time.
Below is from Michal:
-----------------------------------------------------------------------
We give you here some quick patches you can implement to the master
server to improve its performance for that amount of files:

In matocsserv.c in mfsmaster you need to change this line:

#define MaxPacketSize 50000000
into this:
#define MaxPacketSize 500000000

Also we suggest a change in filesystem.c in mfsmaster in "fs_test_files"
function. Change this line:

if ((uint32_t)(main_time())<=starttime+150) {
into:
if ((uint32_t)(main_time())<=starttime+900) {

And also changing this line:
        for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) {
into this:
        for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) {

You need to recompile the master server and start it again. The above
changes should make the master server work more stable with large amount
of files.

Another suggestion would be to create two MooseFS instances (eg. 2 x 200
million files). One master server could also be metalogger for the
another system and vice versa.


Kind regards

Michał 
-----------------------------------------------------------------------------

--
Guowen Shen

On Sun, 2010-08-08 at 22:51 +0800, TianYuchuan(田玉川) wrote:
> 
>  
> hello,everyone!
> I have a big quertion,please help me,thank you very much.
> We intend to use moosefs at our product environment as the storage of
> our online photo service. 
> We'll store for about 200 million photo files.  
> I've built one master server(48G mem), one metalogger server, eight
> chunk servers(8*1T SATA). When I copy photo files to the moosefs
> system. At start everything is good. But  I had copyed files 57
> million ，the master machines'CPU were used 100% 
> I sthoped the master when used “/user/local/mfs/sbin/mfsmasterserver
> -s”，that I started the master。but there was a big  problem ，the
> master had not read my files。 These documents are important to me，I
> am very anxious，please help me recover these files，tihanks。
>  
>  I got many error syslog from master server:
> 
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 41991323: 2668/2526212449954462668/176s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 00000000043CD358 (inode: 50379931 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 50379931: 2926/4294909215566102926/163b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 00000000002966C3 (inode: 48284 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 48284: bookdata/178/8533354296639220178/180b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000594726 (inode: 4242588 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 4242588: bookdata/6631/4300989258725036631/85s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000993541 (inode: 8436892 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 8436892: bookdata/7534/3147352338521267534/122b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000D906E6 (inode: 12631196 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 12631196: bookdata/8691/11879047433161548691/164s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 000000000118DC1E (inode: 16825500 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 16825500: bookdata/1232/17850056326363351232/166b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001681BC7 (inode: 21019804 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 21019804: bookdata/26/12779298489336140026/246s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001A804E1 (inode: 25214108 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 25214108: bookdata/3886/8729781571075193886/30s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001E7E826 (inode: 29408412 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 29408412: bookdata/4757/142868991575144757/316b.jpg
> 
> 
> Aug  7 23:56:36 localhost mfsmaster[10546]: CS(192.168.0.124) packet
> too long (115289537/50000000)
> Aug  7 23:56:36 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.124, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> Aug  8 00:08:14 localhost mfsmaster[10546]: CS(192.168.0.127) packet
> too long (104113889/50000000)
> Aug  8 00:08:14 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.127, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> Aug  8 00:21:03 localhost mfsmaster[10546]: CS(192.168.0.120) packet
> too long (117046565/50000000)
> Aug  8 00:21:03 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.120, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> 
> when I visited the mfscgi，the error  was“Can't connect to MFS master
> (IP:127.0.0.1 ; PORT:9421)”
> 。
> 
> Thanks all！ 
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by 
> 
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev 
> _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users



------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
moosefs-users mailing list
moo...@li...
https://lists.sourceforge.net/lists/listinfo/moosefs-users

[Moosefs-users] To access data was very slowly，nearly 2 minute。oh my god！

From: TianYuchuan(田玉川) <ti...@fo...> - 2011-03-17 09:55:51

Hello 

My moosefs system was accessd  very slowly，I nave no idea，please help me！Thanks！！！
files number 104964618 ，chunks number 104963962。
master load is not high，but When the hour every to data cannot accessed，continued for several minutes。General，visit concurrent small， to access data delay was needed a few seconds。

My moosefs system have nine chunks，

The chunk station
1 localhost 192.168.0.118 9422 1.6.19 23387618 3.6 TiB 4.5 TiB 79.72 0 0 B 0 B - 
2 localhost 192.168.0.119 9422 1.6.19 23246974 3.6 TiB 4.5 TiB 79.72 0 0 B 0 B - 
3 localhost 192.168.0.120 9422 1.6.19 23360333 3.6 TiB 4.5 TiB 79.72 0 0 B 0 B - 
4 localhost 192.168.0.121 9422 1.6.19 23192013 3.6 TiB 4.5 TiB 79.69 0 0 B 0 B - 
5 localhost 192.168.0.122 9422 1.6.19 23483418 3.6 TiB 4.5 TiB 79.70 0 0 B 0 B - 
6 localhost 192.168.0.123 9422 1.6.19 23308366 3.6 TiB 4.5 TiB 79.70 0 0 B 0 B - 
7 localhost 192.168.0.124 9422 1.6.19 23361992 3.6 TiB 4.5 TiB 79.69 0 0 B 0 B - 
8 localhost 192.168.0.125 9422 1.6.19 23300478 3.6 TiB 4.5 TiB 79.70 0 0 B 0 B - 
9 localhost 192.168.0.127 9422 1.6.19 23284897 3.5 TiB 4.5 TiB 78.72 0 0 B 0 B -


--------------------------------------------------------------------------------------------------------------------------------------------------
[root@localhost mfs]# free -m
             total       used       free     shared    buffers     cached
Mem:         48295      46127       2168          0         38       8204
-/+ buffers/cache:      37884      10411
Swap:            0          0          0

The CPU using 95%，the highest was by 150%。



-----邮件原件-----
发件人: Shen Guowen [mailto:sh...@ui...] 
发送时间: 2010年8月9日 10:42
收件人: TianYuchuan(田玉川)
抄送: moo...@li...
主题: Re: [Moosefs-users] mfs-master[10546]: CS(192.168.0.125) packet too long (115289537/50000000)

Don't worry!
This is because some of your chunk servers are currently unreachable,
and the master server notices it, then modifies the meta data of files
in those chunk servers to set the "allvalidcopies" to 0 in "struct
chunk". When the master is rescanning the files (fs_test_files() in
filesystem.c), it finds out the valid copy is 0, then print information
into syslog file, just as listed below. However, printing process is
quite time-consuming, especially the mount of files is large. During
this period, the master ignores the chunk server's connection (because
it is in a big loop of test files, and it is a single thread to do this,
maybe this is a pitfall). So although you make sure the chunk server
working correctly, it is useless (you can notice the reconnecting
information in chunk server's syslog file).
You could let the master finish printing, then it will reconnect with
chunk servers, and will notice the files is there, then set the
"allvalidcopies" to a correct value. Then works normally.
Or you can re-compile the program with commenting the line 5512 and line
5482 in filesystem.c(mfs-1.6.15). It will ignore the print messages and
of cause, reduce the fs test time.
Below is from Michal:
-----------------------------------------------------------------------
We give you here some quick patches you can implement to the master
server to improve its performance for that amount of files:

In matocsserv.c in mfsmaster you need to change this line:

#define MaxPacketSize 50000000
into this:
#define MaxPacketSize 500000000

Also we suggest a change in filesystem.c in mfsmaster in "fs_test_files"
function. Change this line:

if ((uint32_t)(main_time())<=starttime+150) {
into:
if ((uint32_t)(main_time())<=starttime+900) {

And also changing this line:
        for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) {
into this:
        for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) {

You need to recompile the master server and start it again. The above
changes should make the master server work more stable with large amount
of files.

Another suggestion would be to create two MooseFS instances (eg. 2 x 200
million files). One master server could also be metalogger for the
another system and vice versa.


Kind regards

Michał 
-----------------------------------------------------------------------------

--
Guowen Shen

On Sun, 2010-08-08 at 22:51 +0800, TianYuchuan(田玉川) wrote:
> 
>  
> hello,everyone!
> I have a big quertion,please help me,thank you very much.
> We intend to use moosefs at our product environment as the storage of
> our online photo service. 
> We'll store for about 200 million photo files.  
> I've built one master server(48G mem), one metalogger server, eight
> chunk servers(8*1T SATA). When I copy photo files to the moosefs
> system. At start everything is good. But  I had copyed files 57
> million ，the master machines'CPU were used 100% 
> I sthoped the master when used “/user/local/mfs/sbin/mfsmasterserver
> -s”，that I started the master。but there was a big  problem ，the
> master had not read my files。 These documents are important to me，I
> am very anxious，please help me recover these files，tihanks。
>  
>  I got many error syslog from master server:
> 
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 41991323: 2668/2526212449954462668/176s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 00000000043CD358 (inode: 50379931 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 50379931: 2926/4294909215566102926/163b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 00000000002966C3 (inode: 48284 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 48284: bookdata/178/8533354296639220178/180b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000594726 (inode: 4242588 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 4242588: bookdata/6631/4300989258725036631/85s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000993541 (inode: 8436892 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 8436892: bookdata/7534/3147352338521267534/122b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000D906E6 (inode: 12631196 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 12631196: bookdata/8691/11879047433161548691/164s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 000000000118DC1E (inode: 16825500 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 16825500: bookdata/1232/17850056326363351232/166b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001681BC7 (inode: 21019804 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 21019804: bookdata/26/12779298489336140026/246s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001A804E1 (inode: 25214108 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 25214108: bookdata/3886/8729781571075193886/30s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001E7E826 (inode: 29408412 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 29408412: bookdata/4757/142868991575144757/316b.jpg
> 
> 
> Aug  7 23:56:36 localhost mfsmaster[10546]: CS(192.168.0.124) packet
> too long (115289537/50000000)
> Aug  7 23:56:36 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.124, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> Aug  8 00:08:14 localhost mfsmaster[10546]: CS(192.168.0.127) packet
> too long (104113889/50000000)
> Aug  8 00:08:14 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.127, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> Aug  8 00:21:03 localhost mfsmaster[10546]: CS(192.168.0.120) packet
> too long (117046565/50000000)
> Aug  8 00:21:03 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.120, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> 
> when I visited the mfscgi，the error  was“Can't connect to MFS master
> (IP:127.0.0.1 ; PORT:9421)”
> 。
> 
> Thanks all！ 
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by 
> 
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev 
> _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users

Re: [Moosefs-users] To access data was very slowly，nearly 2 minute。oh my god！

From: Michal B. <mic...@ge...> - 2011-03-24 08:34:34

Hi!

You have almost all RAM consumed. As you have 100 million files in the system we suggest putting some extra RAM to the master server. Also it would be advisable to insert SSD disk into the master server so that the hourly metadata dump takes less time.


Kind regards
Michał Borychowski 
MooseFS Support Manager
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Gemius S.A.
ul. Wołoska 7, 02-672 Warszawa
Budynek MARS, klatka D
Tel.: +4822 874-41-00
Fax : +4822 874-41-01



-----Original Message-----
From: TianYuchuan(田玉川) [mailto:ti...@fo...] 
Sent: Thursday, March 17, 2011 10:03 AM
To: Shen Guowen
Cc: moo...@li...
Subject: [Moosefs-users] To access data was very slowly，nearly 2 minute。oh my god！


Hello 

My moosefs system was accessd  very slowly，I nave no idea，please help me！Thanks！！！
files number 104964618 ，chunks number 104963962。
master load is not high，but When the hour every to data cannot accessed，continued for several minutes。General，visit concurrent small， to access data delay was needed a few seconds。

My moosefs system have nine chunks，

The chunk station
1 localhost 192.168.0.118 9422 1.6.19 23387618 3.6 TiB 4.5 TiB 79.72 0 0 B 0 B - 
2 localhost 192.168.0.119 9422 1.6.19 23246974 3.6 TiB 4.5 TiB 79.72 0 0 B 0 B - 
3 localhost 192.168.0.120 9422 1.6.19 23360333 3.6 TiB 4.5 TiB 79.72 0 0 B 0 B - 
4 localhost 192.168.0.121 9422 1.6.19 23192013 3.6 TiB 4.5 TiB 79.69 0 0 B 0 B - 
5 localhost 192.168.0.122 9422 1.6.19 23483418 3.6 TiB 4.5 TiB 79.70 0 0 B 0 B - 
6 localhost 192.168.0.123 9422 1.6.19 23308366 3.6 TiB 4.5 TiB 79.70 0 0 B 0 B - 
7 localhost 192.168.0.124 9422 1.6.19 23361992 3.6 TiB 4.5 TiB 79.69 0 0 B 0 B - 
8 localhost 192.168.0.125 9422 1.6.19 23300478 3.6 TiB 4.5 TiB 79.70 0 0 B 0 B - 
9 localhost 192.168.0.127 9422 1.6.19 23284897 3.5 TiB 4.5 TiB 78.72 0 0 B 0 B -


--------------------------------------------------------------------------------------------------------------------------------------------------
[root@localhost mfs]# free -m
             total       used       free     shared    buffers     cached
Mem:         48295      46127       2168          0         38       8204
-/+ buffers/cache:      37884      10411
Swap:            0          0          0

The CPU using 95%，the highest was by 150%。



-----邮件原件-----
发件人: Shen Guowen [mailto:sh...@ui...] 
发送时间: 2010年8月9日 10:42
收件人: TianYuchuan(田玉川)
抄送: moo...@li...
主题: Re: [Moosefs-users] mfs-master[10546]: CS(192.168.0.125) packet too long (115289537/50000000)

Don't worry!
This is because some of your chunk servers are currently unreachable,
and the master server notices it, then modifies the meta data of files
in those chunk servers to set the "allvalidcopies" to 0 in "struct
chunk". When the master is rescanning the files (fs_test_files() in
filesystem.c), it finds out the valid copy is 0, then print information
into syslog file, just as listed below. However, printing process is
quite time-consuming, especially the mount of files is large. During
this period, the master ignores the chunk server's connection (because
it is in a big loop of test files, and it is a single thread to do this,
maybe this is a pitfall). So although you make sure the chunk server
working correctly, it is useless (you can notice the reconnecting
information in chunk server's syslog file).
You could let the master finish printing, then it will reconnect with
chunk servers, and will notice the files is there, then set the
"allvalidcopies" to a correct value. Then works normally.
Or you can re-compile the program with commenting the line 5512 and line
5482 in filesystem.c(mfs-1.6.15). It will ignore the print messages and
of cause, reduce the fs test time.
Below is from Michal:
-----------------------------------------------------------------------
We give you here some quick patches you can implement to the master
server to improve its performance for that amount of files:

In matocsserv.c in mfsmaster you need to change this line:

#define MaxPacketSize 50000000
into this:
#define MaxPacketSize 500000000

Also we suggest a change in filesystem.c in mfsmaster in "fs_test_files"
function. Change this line:

if ((uint32_t)(main_time())<=starttime+150) {
into:
if ((uint32_t)(main_time())<=starttime+900) {

And also changing this line:
        for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) {
into this:
        for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) {

You need to recompile the master server and start it again. The above
changes should make the master server work more stable with large amount
of files.

Another suggestion would be to create two MooseFS instances (eg. 2 x 200
million files). One master server could also be metalogger for the
another system and vice versa.


Kind regards

Michał 
-----------------------------------------------------------------------------

--
Guowen Shen

On Sun, 2010-08-08 at 22:51 +0800, TianYuchuan(田玉川) wrote:
> 
>  
> hello,everyone!
> I have a big quertion,please help me,thank you very much.
> We intend to use moosefs at our product environment as the storage of
> our online photo service. 
> We'll store for about 200 million photo files.  
> I've built one master server(48G mem), one metalogger server, eight
> chunk servers(8*1T SATA). When I copy photo files to the moosefs
> system. At start everything is good. But  I had copyed files 57
> million ，the master machines'CPU were used 100% 
> I sthoped the master when used “/user/local/mfs/sbin/mfsmasterserver
> -s”，that I started the master。but there was a big  problem ，the
> master had not read my files。 These documents are important to me，I
> am very anxious，please help me recover these files，tihanks。
>  
>  I got many error syslog from master server:
> 
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 41991323: 2668/2526212449954462668/176s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 00000000043CD358 (inode: 50379931 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 50379931: 2926/4294909215566102926/163b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 00000000002966C3 (inode: 48284 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 48284: bookdata/178/8533354296639220178/180b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000594726 (inode: 4242588 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 4242588: bookdata/6631/4300989258725036631/85s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000993541 (inode: 8436892 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 8436892: bookdata/7534/3147352338521267534/122b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000D906E6 (inode: 12631196 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 12631196: bookdata/8691/11879047433161548691/164s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 000000000118DC1E (inode: 16825500 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 16825500: bookdata/1232/17850056326363351232/166b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001681BC7 (inode: 21019804 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 21019804: bookdata/26/12779298489336140026/246s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001A804E1 (inode: 25214108 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 25214108: bookdata/3886/8729781571075193886/30s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001E7E826 (inode: 29408412 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 29408412: bookdata/4757/142868991575144757/316b.jpg
> 
> 
> Aug  7 23:56:36 localhost mfsmaster[10546]: CS(192.168.0.124) packet
> too long (115289537/50000000)
> Aug  7 23:56:36 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.124, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> Aug  8 00:08:14 localhost mfsmaster[10546]: CS(192.168.0.127) packet
> too long (104113889/50000000)
> Aug  8 00:08:14 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.127, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> Aug  8 00:21:03 localhost mfsmaster[10546]: CS(192.168.0.120) packet
> too long (117046565/50000000)
> Aug  8 00:21:03 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.120, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> 
> when I visited the mfscgi，the error  was“Can't connect to MFS master
> (IP:127.0.0.1 ; PORT:9421)”
> 。
> 
> Thanks all！ 
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by 
> 
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev 
> _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users


------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
moosefs-users mailing list
moo...@li...
https://lists.sourceforge.net/lists/listinfo/moosefs-users

[Moosefs-users] 答复: To access data was very slowly，nearly 2 minute。oh my god！

From: TianYuchuan(田玉川) <ti...@fo...> - 2011-03-25 06:10:01

Hi!
Thanks!

The master server inserted SAS 15K speed disk! 

Now the problem had solved!
I had update the moosefs version,now the version is mfs-1.6.20-2.
Updated ，Cpu was 16%。

Anthor question！
Moosefs upgrade! I installed the moosefs mfs-1.6.20-2  version of  a  new server，and  started  the master、chunkserver、client。
The old masterserver was not stoped。The old masterserver was not connected chunkserver、not connected client，but the master process occupied 80% CPU，then I restarted the master service，reduced to 5% CPU utilization。

The master cannot release the CPU?




-----邮件原件-----
发件人: Michal Borychowski [mailto:mic...@ge...] 
发送时间: 2011年3月24日 16:34
收件人: 'TianYuchuan(田玉川)'; 'Shen Guowen'
抄送: moo...@li...
主题: RE: [Moosefs-users] To access data was very slowly，nearly 2 minute。oh my god！

Hi!

You have almost all RAM consumed. As you have 100 million files in the system we suggest putting some extra RAM to the master server. Also it would be advisable to insert SSD disk into the master server so that the hourly metadata dump takes less time.


Kind regards
Michał Borychowski 
MooseFS Support Manager
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Gemius S.A.
ul. Wołoska 7, 02-672 Warszawa
Budynek MARS, klatka D
Tel.: +4822 874-41-00
Fax : +4822 874-41-01



-----Original Message-----
From: TianYuchuan(田玉川) [mailto:ti...@fo...] 
Sent: Thursday, March 17, 2011 10:03 AM
To: Shen Guowen
Cc: moo...@li...
Subject: [Moosefs-users] To access data was very slowly，nearly 2 minute。oh my god！


Hello 

My moosefs system was accessd  very slowly，I nave no idea，please help me！Thanks！！！
files number 104964618 ，chunks number 104963962。
master load is not high，but When the hour every to data cannot accessed，continued for several minutes。General，visit concurrent small， to access data delay was needed a few seconds。

My moosefs system have nine chunks，

The chunk station
1 localhost 192.168.0.118 9422 1.6.19 23387618 3.6 TiB 4.5 TiB 79.72 0 0 B 0 B - 
2 localhost 192.168.0.119 9422 1.6.19 23246974 3.6 TiB 4.5 TiB 79.72 0 0 B 0 B - 
3 localhost 192.168.0.120 9422 1.6.19 23360333 3.6 TiB 4.5 TiB 79.72 0 0 B 0 B - 
4 localhost 192.168.0.121 9422 1.6.19 23192013 3.6 TiB 4.5 TiB 79.69 0 0 B 0 B - 
5 localhost 192.168.0.122 9422 1.6.19 23483418 3.6 TiB 4.5 TiB 79.70 0 0 B 0 B - 
6 localhost 192.168.0.123 9422 1.6.19 23308366 3.6 TiB 4.5 TiB 79.70 0 0 B 0 B - 
7 localhost 192.168.0.124 9422 1.6.19 23361992 3.6 TiB 4.5 TiB 79.69 0 0 B 0 B - 
8 localhost 192.168.0.125 9422 1.6.19 23300478 3.6 TiB 4.5 TiB 79.70 0 0 B 0 B - 
9 localhost 192.168.0.127 9422 1.6.19 23284897 3.5 TiB 4.5 TiB 78.72 0 0 B 0 B -


--------------------------------------------------------------------------------------------------------------------------------------------------
[root@localhost mfs]# free -m
             total       used       free     shared    buffers     cached
Mem:         48295      46127       2168          0         38       8204
-/+ buffers/cache:      37884      10411
Swap:            0          0          0

The CPU using 95%，the highest was by 150%。



-----邮件原件-----
发件人: Shen Guowen [mailto:sh...@ui...] 
发送时间: 2010年8月9日 10:42
收件人: TianYuchuan(田玉川)
抄送: moo...@li...
主题: Re: [Moosefs-users] mfs-master[10546]: CS(192.168.0.125) packet too long (115289537/50000000)

Don't worry!
This is because some of your chunk servers are currently unreachable,
and the master server notices it, then modifies the meta data of files
in those chunk servers to set the "allvalidcopies" to 0 in "struct
chunk". When the master is rescanning the files (fs_test_files() in
filesystem.c), it finds out the valid copy is 0, then print information
into syslog file, just as listed below. However, printing process is
quite time-consuming, especially the mount of files is large. During
this period, the master ignores the chunk server's connection (because
it is in a big loop of test files, and it is a single thread to do this,
maybe this is a pitfall). So although you make sure the chunk server
working correctly, it is useless (you can notice the reconnecting
information in chunk server's syslog file).
You could let the master finish printing, then it will reconnect with
chunk servers, and will notice the files is there, then set the
"allvalidcopies" to a correct value. Then works normally.
Or you can re-compile the program with commenting the line 5512 and line
5482 in filesystem.c(mfs-1.6.15). It will ignore the print messages and
of cause, reduce the fs test time.
Below is from Michal:
-----------------------------------------------------------------------
We give you here some quick patches you can implement to the master
server to improve its performance for that amount of files:

In matocsserv.c in mfsmaster you need to change this line:

#define MaxPacketSize 50000000
into this:
#define MaxPacketSize 500000000

Also we suggest a change in filesystem.c in mfsmaster in "fs_test_files"
function. Change this line:

if ((uint32_t)(main_time())<=starttime+150) {
into:
if ((uint32_t)(main_time())<=starttime+900) {

And also changing this line:
        for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) {
into this:
        for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) {

You need to recompile the master server and start it again. The above
changes should make the master server work more stable with large amount
of files.

Another suggestion would be to create two MooseFS instances (eg. 2 x 200
million files). One master server could also be metalogger for the
another system and vice versa.


Kind regards

Michał 
-----------------------------------------------------------------------------

--
Guowen Shen

On Sun, 2010-08-08 at 22:51 +0800, TianYuchuan(田玉川) wrote:
> 
>  
> hello,everyone!
> I have a big quertion,please help me,thank you very much.
> We intend to use moosefs at our product environment as the storage of
> our online photo service. 
> We'll store for about 200 million photo files.  
> I've built one master server(48G mem), one metalogger server, eight
> chunk servers(8*1T SATA). When I copy photo files to the moosefs
> system. At start everything is good. But  I had copyed files 57
> million ，the master machines'CPU were used 100% 
> I sthoped the master when used “/user/local/mfs/sbin/mfsmasterserver
> -s”，that I started the master。but there was a big  problem ，the
> master had not read my files。 These documents are important to me，I
> am very anxious，please help me recover these files，tihanks。
>  
>  I got many error syslog from master server:
> 
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 41991323: 2668/2526212449954462668/176s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 00000000043CD358 (inode: 50379931 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 50379931: 2926/4294909215566102926/163b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 00000000002966C3 (inode: 48284 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 48284: bookdata/178/8533354296639220178/180b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000594726 (inode: 4242588 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 4242588: bookdata/6631/4300989258725036631/85s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000993541 (inode: 8436892 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 8436892: bookdata/7534/3147352338521267534/122b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000D906E6 (inode: 12631196 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 12631196: bookdata/8691/11879047433161548691/164s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 000000000118DC1E (inode: 16825500 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 16825500: bookdata/1232/17850056326363351232/166b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001681BC7 (inode: 21019804 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 21019804: bookdata/26/12779298489336140026/246s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001A804E1 (inode: 25214108 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 25214108: bookdata/3886/8729781571075193886/30s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001E7E826 (inode: 29408412 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 29408412: bookdata/4757/142868991575144757/316b.jpg
> 
> 
> Aug  7 23:56:36 localhost mfsmaster[10546]: CS(192.168.0.124) packet
> too long (115289537/50000000)
> Aug  7 23:56:36 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.124, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> Aug  8 00:08:14 localhost mfsmaster[10546]: CS(192.168.0.127) packet
> too long (104113889/50000000)
> Aug  8 00:08:14 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.127, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> Aug  8 00:21:03 localhost mfsmaster[10546]: CS(192.168.0.120) packet
> too long (117046565/50000000)
> Aug  8 00:21:03 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.120, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> 
> when I visited the mfscgi，the error  was“Can't connect to MFS master
> (IP:127.0.0.1 ; PORT:9421)”
> 。
> 
> Thanks all！ 
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by 
> 
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev 
> _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users


------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
moosefs-users mailing list
moo...@li...
https://lists.sourceforge.net/lists/listinfo/moosefs-users