From: Ioannis A. <ias...@fl...> - 2010-11-09 17:15:45
|
Hello, I have some pretty important questions regarding chunking. The first question is with respect to the default chunk size, and whether it can easily be modified, perhaps in a configuration file. The second question is how exactly does the chunking work with small files. In our case we have four types of files: - several hundred thousand small files of less than 10KB - several million medium files of around 10MB - several tens of thousand large files of around 200MB - several thousand extra large files, larger than 500MB What would be a good chunk size in this case to prevent space loss? Regards. -- Ioannis Aslanidis System and Network Administrator Flumotion Services, S.A. E-Mail: iaslanidis at flumotion dot com Office Phone: +34 93 508 63 59 Mobile Phone: +34 672 20 45 75 |
From: Laurent W. <lw...@hy...> - 2010-11-10 09:10:38
|
On Tue, 9 Nov 2010 17:13:34 +0100 Ioannis Aslanidis <ias...@fl...> wrote: > Hello, > > I have some pretty important questions regarding chunking. > > The first question is with respect to the default chunk size, and > whether it can easily be modified, perhaps in a configuration file. Chunk size is hardcoded, for performance reasons. Having it modified is possible, though you'll have to crawl through the code and modify the config file parser. You may have to change Chunk header size too, and some other things. So it's not really trivial. > > The second question is how exactly does the chunking work with small files. > > In our case we have four types of files: > - several hundred thousand small files of less than 10KB A file is stored in a 64KB block. So here you'll lose quite a lot of space. > - several million medium files of around 10MB I'm wondering if the file would use 64KB blocks or a complete 64MB chunk in that case. Probably 64KB blocks, but I'm really unsure. Michal ? > - several tens of thousand large files of around 200MB > - several thousand extra large files, larger than 500MB Here the answer is quite clear. > > What would be a good chunk size in this case to prevent space loss? Maybe 16 MB. But I'm afraid of the performance hit. Keep in mind MooseFS wasn't designed to store small files. Is lost space so important given your volume compared to data security ? HTH, -- Laurent Wandrebeck HYGEOS, Earth Observation Department / Observation de la Terre Euratechnologies 165 Avenue de Bretagne 59000 Lille, France tel: +33 3 20 08 24 98 http://www.hygeos.com GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C D17C F64C |
From: Ioannis A. <ias...@fl...> - 2010-11-10 10:12:17
|
Hello, >From what you say, by default we have blocks of 64KB and chunks of 64MB. Correct? This means that small files use 64KB blocks while big files use 64MB chunks. Is this the case? So in the end, the difference is that many small files can fill in a chunk, while big files take in the whole chunk. Are there any performance indicators that show how much space gets lost? Regards. On Wed, Nov 10, 2010 at 10:10 AM, Laurent Wandrebeck <lw...@hy...> wrote: > On Tue, 9 Nov 2010 17:13:34 +0100 > Ioannis Aslanidis <ias...@fl...> wrote: > >> Hello, >> >> I have some pretty important questions regarding chunking. >> >> The first question is with respect to the default chunk size, and >> whether it can easily be modified, perhaps in a configuration file. > Chunk size is hardcoded, for performance reasons. Having it modified is > possible, though you'll have to crawl through the code and modify the > config file parser. You may have to change Chunk header size too, and > some other things. So it's not really trivial. >> >> The second question is how exactly does the chunking work with small files. >> >> In our case we have four types of files: >> - several hundred thousand small files of less than 10KB > A file is stored in a 64KB block. So here you'll lose quite a lot of > space. >> - several million medium files of around 10MB > I'm wondering if the file would use 64KB blocks or a complete 64MB > chunk in that case. Probably 64KB blocks, but I'm really unsure. > Michal ? >> - several tens of thousand large files of around 200MB >> - several thousand extra large files, larger than 500MB > Here the answer is quite clear. >> >> What would be a good chunk size in this case to prevent space loss? > Maybe 16 MB. But I'm afraid of the performance hit. Keep in mind > MooseFS wasn't designed to store small files. Is lost space so > important given your volume compared to data security ? > HTH, > -- > Laurent Wandrebeck > HYGEOS, Earth Observation Department / Observation de la Terre > Euratechnologies > 165 Avenue de Bretagne > 59000 Lille, France > tel: +33 3 20 08 24 98 > http://www.hygeos.com > GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C > D17C F64C > > ------------------------------------------------------------------------------ > The Next 800 Companies to Lead America's Growth: New Video Whitepaper > David G. Thomson, author of the best-selling book "Blueprint to a > Billion" shares his insights and actions to help propel your > business during the next growth cycle. Listen Now! > http://p.sf.net/sfu/SAP-dev2dev > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > -- Ioannis Aslanidis System and Network Administrator Flumotion Services, S.A. E-Mail: iaslanidis at flumotion dot com Office Phone: +34 93 508 63 59 Mobile Phone: +34 672 20 45 75 |
From: Michał B. <mic...@ge...> - 2010-11-15 07:32:32
|
Hi! Please have a look at these FAQ entries: http://www.moosefs.org/moosefs-faq.html#source_code (here are some examples no how to calculate the unused space) http://www.moosefs.org/moosefs-faq.html#modify_chunk (nope, it's hardcoded) If you need any further assistance please let us know. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 -----Original Message----- From: Ioannis Aslanidis [mailto:ias...@fl...] Sent: Wednesday, November 10, 2010 10:44 AM To: moo...@li... Subject: Re: [Moosefs-users] Chunking in MooseFS Hello, >From what you say, by default we have blocks of 64KB and chunks of 64MB. Correct? This means that small files use 64KB blocks while big files use 64MB chunks. Is this the case? So in the end, the difference is that many small files can fill in a chunk, while big files take in the whole chunk. Are there any performance indicators that show how much space gets lost? Regards. On Wed, Nov 10, 2010 at 10:10 AM, Laurent Wandrebeck <lw...@hy...> wrote: > On Tue, 9 Nov 2010 17:13:34 +0100 > Ioannis Aslanidis <ias...@fl...> wrote: > >> Hello, >> >> I have some pretty important questions regarding chunking. >> >> The first question is with respect to the default chunk size, and >> whether it can easily be modified, perhaps in a configuration file. > Chunk size is hardcoded, for performance reasons. Having it modified is > possible, though you'll have to crawl through the code and modify the > config file parser. You may have to change Chunk header size too, and > some other things. So it's not really trivial. >> >> The second question is how exactly does the chunking work with small files. >> >> In our case we have four types of files: >> - several hundred thousand small files of less than 10KB > A file is stored in a 64KB block. So here you'll lose quite a lot of > space. >> - several million medium files of around 10MB > I'm wondering if the file would use 64KB blocks or a complete 64MB > chunk in that case. Probably 64KB blocks, but I'm really unsure. > Michal ? >> - several tens of thousand large files of around 200MB >> - several thousand extra large files, larger than 500MB > Here the answer is quite clear. >> >> What would be a good chunk size in this case to prevent space loss? > Maybe 16 MB. But I'm afraid of the performance hit. Keep in mind > MooseFS wasn't designed to store small files. Is lost space so > important given your volume compared to data security ? > HTH, > -- > Laurent Wandrebeck > HYGEOS, Earth Observation Department / Observation de la Terre > Euratechnologies > 165 Avenue de Bretagne > 59000 Lille, France > tel: +33 3 20 08 24 98 > http://www.hygeos.com > GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C > D17C F64C > > ------------------------------------------------------------------------------ > The Next 800 Companies to Lead America's Growth: New Video Whitepaper > David G. Thomson, author of the best-selling book "Blueprint to a > Billion" shares his insights and actions to help propel your > business during the next growth cycle. Listen Now! > http://p.sf.net/sfu/SAP-dev2dev > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > -- Ioannis Aslanidis System and Network Administrator Flumotion Services, S.A. E-Mail: iaslanidis at flumotion dot com Office Phone: +34 93 508 63 59 Mobile Phone: +34 672 20 45 75 ------------------------------------------------------------------------------ The Next 800 Companies to Lead America's Growth: New Video Whitepaper David G. Thomson, author of the best-selling book "Blueprint to a Billion" shares his insights and actions to help propel your business during the next growth cycle. Listen Now! http://p.sf.net/sfu/SAP-dev2dev _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Laurent W. <lw...@hy...> - 2010-11-10 10:22:43
|
On Wed, 10 Nov 2010 10:43:36 +0100 Ioannis Aslanidis <ias...@fl...> wrote: > Hello, > > From what you say, by default we have blocks of 64KB and chunks of > 64MB. Correct? right. > > This means that small files use 64KB blocks while big files use 64MB > chunks. Is this the case? I guess so. > > So in the end, the difference is that many small files can fill in a > chunk, while big files take in the whole chunk. agreed. > > Are there any performance indicators that show how much space gets lost? Not that I know of. See http://www.moosefs.org/moosefs-faq.html#source_code for details. HTH, -- Laurent Wandrebeck HYGEOS, Earth Observation Department / Observation de la Terre Euratechnologies 165 Avenue de Bretagne 59000 Lille, France tel: +33 3 20 08 24 98 http://www.hygeos.com GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C D17C F64C |
From: Fabien G. <fab...@gm...> - 2010-11-10 12:24:57
|
Hi, On Wed, Nov 10, 2010 at 11:22 AM, Laurent Wandrebeck <lw...@hy...> wrote: > > This means that small files use 64KB blocks while big files use 64MB > I guess so. > Yes, that's right : [root@mfsmaster]# dd if=/dev/zero of=10k_file bs=10k count=1 1+0 enregistrements lus. 1+0 enregistrements écrits. [root@mfsmaster]# mfsfileinfo 10k_file 10k_file: chunk 0: 00000000001F0EBE_00000001 / (id:2035390 ver:1) copy 1: 192.168.200.2:9422 On the chunkserver : [root@mfschunk2]# ls -lh BE/chunk_00000000001F0EBE_00000001.mfs -rw-r----- 1 nobody nobody 69K nov 10 13:17 chunk_00000000001F0EBE_00000001.mfs Fabien |