From: Michał B. <mic...@ge...> - 2010-08-09 13:15:50
|
Shen, thanks for the reply :) Tian, these limits have been changed in 1.6.16 and now the latest stable is 1.6.17 so we would recommend you just update the master server to 1.6.17. If you need any further assistance please let us know. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 -----Original Message----- From: Shen Guowen [mailto:sh...@ui...] Sent: Monday, August 09, 2010 4:42 AM To: TianYuchuan(田玉川) Cc: moo...@li... Subject: Re: [Moosefs-users] mfs-master[10546]: CS(192.168.0.125) packet too long (115289537/50000000) Don't worry! This is because some of your chunk servers are currently unreachable, and the master server notices it, then modifies the meta data of files in those chunk servers to set the "allvalidcopies" to 0 in "struct chunk". When the master is rescanning the files (fs_test_files() in filesystem.c), it finds out the valid copy is 0, then print information into syslog file, just as listed below. However, printing process is quite time-consuming, especially the mount of files is large. During this period, the master ignores the chunk server's connection (because it is in a big loop of test files, and it is a single thread to do this, maybe this is a pitfall). So although you make sure the chunk server working correctly, it is useless (you can notice the reconnecting information in chunk server's syslog file). You could let the master finish printing, then it will reconnect with chunk servers, and will notice the files is there, then set the "allvalidcopies" to a correct value. Then works normally. Or you can re-compile the program with commenting the line 5512 and line 5482 in filesystem.c(mfs-1.6.15). It will ignore the print messages and of cause, reduce the fs test time. Below is from Michal: ----------------------------------------------------------------------- We give you here some quick patches you can implement to the master server to improve its performance for that amount of files: In matocsserv.c in mfsmaster you need to change this line: #define MaxPacketSize 50000000 into this: #define MaxPacketSize 500000000 Also we suggest a change in filesystem.c in mfsmaster in "fs_test_files" function. Change this line: if ((uint32_t)(main_time())<=starttime+150) { into: if ((uint32_t)(main_time())<=starttime+900) { And also changing this line: for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) { into this: for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) { You need to recompile the master server and start it again. The above changes should make the master server work more stable with large amount of files. Another suggestion would be to create two MooseFS instances (eg. 2 x 200 million files). One master server could also be metalogger for the another system and vice versa. Kind regards Michał ----------------------------------------------------------------------------- -- Guowen Shen On Sun, 2010-08-08 at 22:51 +0800, TianYuchuan(田玉川) wrote: > > > hello,everyone! > I have a big quertion,please help me,thank you very much. > We intend to use moosefs at our product environment as the storage of > our online photo service. > We'll store for about 200 million photo files. > I've built one master server(48G mem), one metalogger server, eight > chunk servers(8*1T SATA). When I copy photo files to the moosefs > system. At start everything is good. But I had copyed files 57 > million ,the master machines'CPU were used 100% > I sthoped the master when used “/user/local/mfs/sbin/mfsmasterserver > -s”,that I started the master。but there was a big problem ,the > master had not read my files。 These documents are important to me,I > am very anxious,please help me recover these files,tihanks。 > > I got many error syslog from master server: > > Aug 6 00:57:01 localhost mfsmaster[10546]: * currently unavailable > file 41991323: 2668/2526212449954462668/176s.jpg > Aug 6 00:57:01 localhost mfsmaster[10546]: currently unavailable > chunk 00000000043CD358 (inode: 50379931 ; index: 0) > Aug 6 00:57:01 localhost mfsmaster[10546]: * currently unavailable > file 50379931: 2926/4294909215566102926/163b.jpg > Aug 6 00:57:01 localhost mfsmaster[10546]: currently unavailable > chunk 00000000002966C3 (inode: 48284 ; index: 0) > Aug 6 00:57:01 localhost mfsmaster[10546]: * currently unavailable > file 48284: bookdata/178/8533354296639220178/180b.jpg > Aug 6 00:57:01 localhost mfsmaster[10546]: currently unavailable > chunk 0000000000594726 (inode: 4242588 ; index: 0) > Aug 6 00:57:01 localhost mfsmaster[10546]: * currently unavailable > file 4242588: bookdata/6631/4300989258725036631/85s.jpg > Aug 6 00:57:01 localhost mfsmaster[10546]: currently unavailable > chunk 0000000000993541 (inode: 8436892 ; index: 0) > Aug 6 00:57:01 localhost mfsmaster[10546]: * currently unavailable > file 8436892: bookdata/7534/3147352338521267534/122b.jpg > Aug 6 00:57:01 localhost mfsmaster[10546]: currently unavailable > chunk 0000000000D906E6 (inode: 12631196 ; index: 0) > Aug 6 00:57:01 localhost mfsmaster[10546]: * currently unavailable > file 12631196: bookdata/8691/11879047433161548691/164s.jpg > Aug 6 00:57:01 localhost mfsmaster[10546]: currently unavailable > chunk 000000000118DC1E (inode: 16825500 ; index: 0) > Aug 6 00:57:01 localhost mfsmaster[10546]: * currently unavailable > file 16825500: bookdata/1232/17850056326363351232/166b.jpg > Aug 6 00:57:01 localhost mfsmaster[10546]: currently unavailable > chunk 0000000001681BC7 (inode: 21019804 ; index: 0) > Aug 6 00:57:01 localhost mfsmaster[10546]: * currently unavailable > file 21019804: bookdata/26/12779298489336140026/246s.jpg > Aug 6 00:57:01 localhost mfsmaster[10546]: currently unavailable > chunk 0000000001A804E1 (inode: 25214108 ; index: 0) > Aug 6 00:57:01 localhost mfsmaster[10546]: * currently unavailable > file 25214108: bookdata/3886/8729781571075193886/30s.jpg > Aug 6 00:57:01 localhost mfsmaster[10546]: currently unavailable > chunk 0000000001E7E826 (inode: 29408412 ; index: 0) > Aug 6 00:57:01 localhost mfsmaster[10546]: * currently unavailable > file 29408412: bookdata/4757/142868991575144757/316b.jpg > > > Aug 7 23:56:36 localhost mfsmaster[10546]: CS(192.168.0.124) packet > too long (115289537/50000000) > Aug 7 23:56:36 localhost mfsmaster[10546]: chunkserver disconnected - > ip: 192.168.0.124, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 > (0.00 GiB) > Aug 8 00:08:14 localhost mfsmaster[10546]: CS(192.168.0.127) packet > too long (104113889/50000000) > Aug 8 00:08:14 localhost mfsmaster[10546]: chunkserver disconnected - > ip: 192.168.0.127, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 > (0.00 GiB) > Aug 8 00:21:03 localhost mfsmaster[10546]: CS(192.168.0.120) packet > too long (117046565/50000000) > Aug 8 00:21:03 localhost mfsmaster[10546]: chunkserver disconnected - > ip: 192.168.0.120, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 > (0.00 GiB) > > when I visited the mfscgi,the error was“Can't connect to MFS master > (IP:127.0.0.1 ; PORT:9421)” > 。 > > Thanks all! > ------------------------------------------------------------------------------ > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challenge > http://p.sf.net/sfu/RIM-dev2dev > _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |