From: Metin A. <met...@gm...> - 2010-05-25 16:03:50
|
Hello, I am looking for a solution for a project i am working on. We are developing a website where people can upload their files and where they can share those files and other people can download them. (similar to rapidshare.com model) Problem is, some files can be demanded much more than other files. And we cant replicate every file on other nodes, the cost would increase exponentialy. The scenerio is like: Simon has uploaded his birthday video and shared it with all of his friends, He has uploaded it to project.com and it was stored in one of the server in the cluster which has 100mbit connection. Problem is, once all of Smion's friends want to download the file, they cant download it since the bottleneck here is 100mbit which is 12.5MB per second, but he got 1000 friends trying to download the video file and they can only download 12.5KB per second which is very very bad. I am not taking into account that the overhead in the hdd. Thus, i need to find a way to replicate only demanded files to scale the network and serve the files without problem. (at least 200KB/sec) My network infrastrucre is as follows: I will have client and storage nodes. For client nodes i will use 1GBIT bandwidth with enough amount of ram and cpu, and that server will be the client. And they would be connected to 4 Nodes of storage servers that each of has 100mbit connection. 1gbit server can handle the 1000 users traffic if one of storage node can stream more than 15MB per second to my 1gbit (client) server and visitor will stream directly from client server instead of storage nodes. I can do it by replicating the file into 2 nodes . But i dont want to replicate all files uploadded to my network to my nodes since it is costing much more. I think and i am sure that somebody has same error in past and they have developed a solution to this problem. So i need a cloud based system, which will push the files into replicated nodes automatically when demanded to those files are high, and when the demand is low, they will delete from other nodes and it will stay in only 1 node. I have looked to glusterfs and asked in their irc channel that problem, and got an answer from the guys that gluster cant do such a thing. It is only able to replicate all the files or none of the files. (i have to define which files to be replicated) But i need it the cluster software to do it automatically. I would use 1gbit client servers and 100mbit storage servers. All the servers will be in same DC. I will rent the servers and i dont own my own DC house. Reason i am choosing 1gbit server as the client is i wont have too much 1gbit server, but i will have many stogage nodes, and 1gbit server is very expensive but 100mbit is not so. I am sure afer some time, i will have some trouble using client server which i have to loadbalance them later, but that is the next step which i dont mind right now. I would be happy to use open source solutions like (which i searched) glusterfs, gfs, google file system, rdbd, parascale, cloudstore,but i really couldnt find which is the best way for me. I thought it is best way to listen other people's experiences'. If you might help i will be happy. (instead of recommending me using amazon s3 :) ) Thanks. |
From: Ricardo J. B. <ric...@da...> - 2010-05-26 15:27:45
|
El Martes 25 May 2010, Metin Akyalı escribió: > Hello, Hi! > I am looking for a solution for a project i am working on. > > We are developing a website where people can upload their files and > where they can share those files and other people can download them. > (similar to rapidshare.com model) [ massive snippage ] > So i need a cloud based system, which will push the files into > replicated nodes automatically when demanded to those files are high, > and when the demand is low, they will delete from other nodes and it > will stay in only 1 node. > > I have looked to glusterfs and asked in their irc channel that > problem, and got an answer from the guys that gluster cant do such a > thing. It is only able to replicate all the files or none of the > files. (i have to define which files to be replicated) But i need it > the cluster software to do it automatically. OK, I don't think MooseFS (mfs for short) will help you either, at least not automagically: in mfs you can specify that some files have to have more copies than others, but you have to do it by hand. However, I would really advise you to store at least 2 copies to avoid data loss in case of one of the storage nodes going kaput. The only distributed/replicated filesystem that I'm aware of capable of automatic load balancing is Ceph (http://ceph.newdream.net/) but it's currently in alpha status and not recommended for production sites. Nonetheless, maybe you can reach your goals with mfs, keep reading :) > I am sure afer some time, i will have some trouble using client > server which i have to loadbalance them later, but that is the next > step which i dont mind right now. Well, I'm going to actually suggest something along those lines. Consider this: no matter how many storage nodes you will have, the bottleneck will then be the frontend server (client to mfs) bandwidth, so you _will_ have to do frontend load balancing, might as well do it from the stat and avoid doing it while in production. With than in mind, and to avoid manually increasing files copies on the backend (storage) nodes, you could cache the files on the frontends (asuming the files won't change often, once uploaded) to speed up serving them. In short: I think your best option is a combination of load balancing/proxy cache and MooseFS with goal >= 2. Best regards, -- Ricardo J. Barberis Senior SysAdmin - I+D Dattatec.com :: Soluciones de Web Hosting Su Hosting hecho Simple..! ------------------------------------------ Nota de confidencialidad: Este mensaje y los archivos adjuntos al mismo son confidenciales, de uso exclusivo para el destinatario del mismo. La divulgación y/o uso del mismo sin autorización por parte de Dattatec.com queda prohibida. Dattatec.com no se hace responsable del mensaje por la falsificación y/o alteración del mismo. De no ser Ud. el destinatario del mismo y lo ha recibido por error, por favor notifique al remitente y elimínelo de su sistema. Confidentiality Note: This message and any attachments (the message) are confidential and intended solely for the addressees. Any unauthorised use or dissemination is prohibited by Dattatec.com. Dattatec.com shall not be liable for the message if altered or falsified. If you are not the intended addressee of this message, please cancel it immediately and inform the sender. Nota de Confidencialidade: Esta mensagem e seus eventuais anexos podem conter dados confidenciais ou privilegiados. Se você os recebeu por engano ou não é um dos destinatários aos quais ela foi endereçada, por favor destrua-a e a todos os seus eventuais anexos ou copias realizadas, imediatamente. É proibida a retenção, distribuição, divulgação ou utilização de quaisquer informações aqui contidas. Por favor, informe-nos sobre o recebimento indevido desta mensagem, retornando-a para o autor. |
From: Michał B. <mic...@ge...> - 2010-05-27 09:24:07
|
Ricardo gave very accurate observations about MooseFS and what you would like to achieve. There is a setting "goal" which tells in how many copies you want to store the file. MooseFS won't automatically adjust this parameter for you, but you can prepare a script which would examine the popularity of the file and will change the goal to a higher number. If the file is not popular any longer, the goal will be reverted to 2 (recommended) or to 1. But the most important thing is the performance and throughput of the clients' machines. What should be interesting for you - we plan to introduce in the near future a cache mechanism so that when a file had been downloaded by a client machine from the chunk server it won't be downloaded again unless the file had been modified. So this would eliminate a problem of network speed between chunks and clients and it would not be necessary to store the file in more than 2 copies (it won't make the system work more quickly). If you need any further assistance please let us know. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 > -----Original Message----- > From: Ricardo J. Barberis [mailto:ric...@da...] > Sent: Wednesday, May 26, 2010 5:05 PM > To: moo...@li... > Subject: Re: [Moosefs-users] Fwd: how to install a cloud storage network > :which is best smart automatic file replication solution for cloud storage > based systems. > > El Martes 25 May 2010, Metin Akyalı escribió: > > Hello, > > Hi! > > > I am looking for a solution for a project i am working on. > > > > We are developing a website where people can upload their files and > > where they can share those files and other people can download them. > > (similar to rapidshare.com model) > > [ massive snippage ] > > > So i need a cloud based system, which will push the files into > > replicated nodes automatically when demanded to those files are high, > > and when the demand is low, they will delete from other nodes and it > > will stay in only 1 node. > > > > I have looked to glusterfs and asked in their irc channel that > > problem, and got an answer from the guys that gluster cant do such a > > thing. It is only able to replicate all the files or none of the > > files. (i have to define which files to be replicated) But i need it > > the cluster software to do it automatically. > > OK, I don't think MooseFS (mfs for short) will help you either, at least not > automagically: in mfs you can specify that some files have to have more > copies than others, but you have to do it by hand. > > However, I would really advise you to store at least 2 copies to avoid data > loss in case of one of the storage nodes going kaput. > > > The only distributed/replicated filesystem that I'm aware of capable of > automatic load balancing is Ceph (http://ceph.newdream.net/) but it's > currently in alpha status and not recommended for production sites. > > Nonetheless, maybe you can reach your goals with mfs, keep reading :) > > > I am sure afer some time, i will have some trouble using client > > server which i have to loadbalance them later, but that is the next > > step which i dont mind right now. > > Well, I'm going to actually suggest something along those lines. > > Consider this: no matter how many storage nodes you will have, the bottleneck > will then be the frontend server (client to mfs) bandwidth, so you _will_ > have to do frontend load balancing, might as well do it from the stat and > avoid doing it while in production. > > With than in mind, and to avoid manually increasing files copies on the > backend (storage) nodes, you could cache the files on the frontends (asuming > the files won't change often, once uploaded) to speed up serving them. > > In short: I think your best option is a combination of load balancing/proxy > cache and MooseFS with goal >= 2. > > > Best regards, > -- > Ricardo J. Barberis > Senior SysAdmin - I+D > Dattatec.com :: Soluciones de Web Hosting > Su Hosting hecho Simple..! > > ------------------------------------------ > > Nota de confidencialidad: Este mensaje y los archivos adjuntos al mismo > son confidenciales, de uso exclusivo para el destinatario del mismo. La > divulgación y/o uso del mismo sin autorización por parte de Dattatec.com > queda prohibida. Dattatec.com no se hace responsable del mensaje por la > falsificación y/o alteración del mismo. > De no ser Ud. el destinatario del mismo y lo ha recibido por error, por > favor notifique al remitente y elimínelo de su sistema. > > Confidentiality Note: This message and any attachments (the message) are > confidential and intended solely for the addressees. Any unauthorised use > or dissemination is prohibited by Dattatec.com. Dattatec.com shall not be > liable for the message if altered or falsified. > If you are not the intended addressee of this message, please cancel it > immediately and inform the sender. > > Nota de Confidencialidade: Esta mensagem e seus eventuais anexos podem > conter dados confidenciais ou privilegiados. Se você os recebeu por engano > ou não é um dos destinatários aos quais ela foi endereçada, por favor > destrua-a e a todos os seus eventuais anexos ou copias realizadas, > imediatamente. > É proibida a retenção, distribuição, divulgação ou utilização de quaisquer > informações aqui contidas. Por favor, informe-nos sobre o recebimento > indevido desta mensagem, retornando-a para o autor. > > ------------------------------------------------------------------------------ > > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Metin A. <met...@gm...> - 2010-05-27 18:31:53
|
Hello everyone, Even you put a caching mechanism, i wont be able to cache files since my average file size would be 500 MB :) But i am sure it will be very useful for the other prople. I am realyl getting excited when i see that there are people working on such projects. I am going to be a contributer to one of the file systems in next 2-3 months and moose fs is in my top list :) Btw, is there any other recommended commercial and open source solutions out there ? 2010/5/27 Michał Borychowski <mic...@ge...>: > Ricardo gave very accurate observations about MooseFS and what you would like to achieve. > > There is a setting "goal" which tells in how many copies you want to store the file. MooseFS won't automatically adjust this parameter for you, but you can prepare a script which would examine the popularity of the file and will change the goal to a higher number. If the file is not popular any longer, the goal will be reverted to 2 (recommended) or to 1. > > But the most important thing is the performance and throughput of the clients' machines. What should be interesting for you - we plan to introduce in the near future a cache mechanism so that when a file had been downloaded by a client machine from the chunk server it won't be downloaded again unless the file had been modified. So this would eliminate a problem of network speed between chunks and clients and it would not be necessary to store the file in more than 2 copies (it won't make the system work more quickly). > > If you need any further assistance please let us know. > > > Kind regards > Michał Borychowski > MooseFS Support Manager > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > Gemius S.A. > ul. Wołoska 7, 02-672 Warszawa > Budynek MARS, klatka D > Tel.: +4822 874-41-00 > Fax : +4822 874-41-01 > > > > >> -----Original Message----- >> From: Ricardo J. Barberis [mailto:ric...@da...] >> Sent: Wednesday, May 26, 2010 5:05 PM >> To: moo...@li... >> Subject: Re: [Moosefs-users] Fwd: how to install a cloud storage network >> :which is best smart automatic file replication solution for cloud storage >> based systems. >> >> El Martes 25 May 2010, Metin Akyalı escribió: >> > Hello, >> >> Hi! >> >> > I am looking for a solution for a project i am working on. >> > >> > We are developing a website where people can upload their files and >> > where they can share those files and other people can download them. >> > (similar to rapidshare.com model) >> >> [ massive snippage ] >> >> > So i need a cloud based system, which will push the files into >> > replicated nodes automatically when demanded to those files are high, >> > and when the demand is low, they will delete from other nodes and it >> > will stay in only 1 node. >> > >> > I have looked to glusterfs and asked in their irc channel that >> > problem, and got an answer from the guys that gluster cant do such a >> > thing. It is only able to replicate all the files or none of the >> > files. (i have to define which files to be replicated) But i need it >> > the cluster software to do it automatically. >> >> OK, I don't think MooseFS (mfs for short) will help you either, at least not >> automagically: in mfs you can specify that some files have to have more >> copies than others, but you have to do it by hand. >> >> However, I would really advise you to store at least 2 copies to avoid data >> loss in case of one of the storage nodes going kaput. >> >> >> The only distributed/replicated filesystem that I'm aware of capable of >> automatic load balancing is Ceph (http://ceph.newdream.net/) but it's >> currently in alpha status and not recommended for production sites. >> >> Nonetheless, maybe you can reach your goals with mfs, keep reading :) >> >> > I am sure afer some time, i will have some trouble using client >> > server which i have to loadbalance them later, but that is the next >> > step which i dont mind right now. >> >> Well, I'm going to actually suggest something along those lines. >> >> Consider this: no matter how many storage nodes you will have, the bottleneck >> will then be the frontend server (client to mfs) bandwidth, so you _will_ >> have to do frontend load balancing, might as well do it from the stat and >> avoid doing it while in production. >> >> With than in mind, and to avoid manually increasing files copies on the >> backend (storage) nodes, you could cache the files on the frontends (asuming >> the files won't change often, once uploaded) to speed up serving them. >> >> In short: I think your best option is a combination of load balancing/proxy >> cache and MooseFS with goal >= 2. >> >> >> Best regards, >> -- >> Ricardo J. Barberis >> Senior SysAdmin - I+D >> Dattatec.com :: Soluciones de Web Hosting >> Su Hosting hecho Simple..! >> >> ------------------------------------------ >> >> Nota de confidencialidad: Este mensaje y los archivos adjuntos al mismo >> son confidenciales, de uso exclusivo para el destinatario del mismo. La >> divulgación y/o uso del mismo sin autorización por parte de Dattatec.com >> queda prohibida. Dattatec.com no se hace responsable del mensaje por la >> falsificación y/o alteración del mismo. >> De no ser Ud. el destinatario del mismo y lo ha recibido por error, por >> favor notifique al remitente y elimínelo de su sistema. >> >> Confidentiality Note: This message and any attachments (the message) are >> confidential and intended solely for the addressees. Any unauthorised use >> or dissemination is prohibited by Dattatec.com. Dattatec.com shall not be >> liable for the message if altered or falsified. >> If you are not the intended addressee of this message, please cancel it >> immediately and inform the sender. >> >> Nota de Confidencialidade: Esta mensagem e seus eventuais anexos podem >> conter dados confidenciais ou privilegiados. Se você os recebeu por engano >> ou não é um dos destinatários aos quais ela foi endereçada, por favor >> destrua-a e a todos os seus eventuais anexos ou copias realizadas, >> imediatamente. >> É proibida a retenção, distribuição, divulgação ou utilização de quaisquer >> informações aqui contidas. Por favor, informe-nos sobre o recebimento >> indevido desta mensagem, retornando-a para o autor. >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> moosefs-users mailing list >> moo...@li... >> https://lists.sourceforge.net/lists/listinfo/moosefs-users > > |
From: Michał B. <mic...@ge...> - 2010-06-01 07:37:15
|
Hi! Caching mechanism would also be helpful for you even with large files if you have enough RAM on your client machine (eg. 8 or 16 GB). We would shortly release a beta version with possibility to enable cache. Regards Michał > -----Original Message----- > From: Metin Akyalı [mailto:met...@gm...] > Sent: Thursday, May 27, 2010 8:32 PM > To: moo...@li... > Subject: Re: [Moosefs-users] Fwd: how to install a cloud storage network > :which is best smart automatic file replication solution for cloud storage > based systems. > > Hello everyone, > > Even you put a caching mechanism, i wont be able to cache files since > my average file size would be 500 MB :) But i am sure it will be very > useful for the other prople. > > I am realyl getting excited when i see that there are people working > on such projects. > > I am going to be a contributer to one of the file systems in next 2-3 > months and moose fs is in my top list :) > Btw, is there any other recommended commercial and open source > solutions out there ? > > 2010/5/27 Michał Borychowski <mic...@ge...>: > > Ricardo gave very accurate observations about MooseFS and what you would > like to achieve. > > > > There is a setting "goal" which tells in how many copies you want to store > the file. MooseFS won't automatically adjust this parameter for you, but you > can prepare a script which would examine the popularity of the file and will > change the goal to a higher number. If the file is not popular any longer, the > goal will be reverted to 2 (recommended) or to 1. > > > > But the most important thing is the performance and throughput of the > clients' machines. What should be interesting for you - we plan to introduce > in the near future a cache mechanism so that when a file had been downloaded > by a client machine from the chunk server it won't be downloaded again unless > the file had been modified. So this would eliminate a problem of network speed > between chunks and clients and it would not be necessary to store the file in > more than 2 copies (it won't make the system work more quickly). > > > > If you need any further assistance please let us know. > > > > > > Kind regards > > Michał Borychowski > > MooseFS Support Manager > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > Gemius S.A. > > ul. Wołoska 7, 02-672 Warszawa > > Budynek MARS, klatka D > > Tel.: +4822 874-41-00 > > Fax : +4822 874-41-01 > > > > > > > > > >> -----Original Message----- > >> From: Ricardo J. Barberis [mailto:ric...@da...] > >> Sent: Wednesday, May 26, 2010 5:05 PM > >> To: moo...@li... > >> Subject: Re: [Moosefs-users] Fwd: how to install a cloud storage network > >> :which is best smart automatic file replication solution for cloud storage > >> based systems. > >> > >> El Martes 25 May 2010, Metin Akyalı escribió: > >> > Hello, > >> > >> Hi! > >> > >> > I am looking for a solution for a project i am working on. > >> > > >> > We are developing a website where people can upload their files and > >> > where they can share those files and other people can download them. > >> > (similar to rapidshare.com model) > >> > >> [ massive snippage ] > >> > >> > So i need a cloud based system, which will push the files into > >> > replicated nodes automatically when demanded to those files are high, > >> > and when the demand is low, they will delete from other nodes and it > >> > will stay in only 1 node. > >> > > >> > I have looked to glusterfs and asked in their irc channel that > >> > problem, and got an answer from the guys that gluster cant do such a > >> > thing. It is only able to replicate all the files or none of the > >> > files. (i have to define which files to be replicated) But i need it > >> > the cluster software to do it automatically. > >> > >> OK, I don't think MooseFS (mfs for short) will help you either, at least > not > >> automagically: in mfs you can specify that some files have to have more > >> copies than others, but you have to do it by hand. > >> > >> However, I would really advise you to store at least 2 copies to avoid data > >> loss in case of one of the storage nodes going kaput. > >> > >> > >> The only distributed/replicated filesystem that I'm aware of capable of > >> automatic load balancing is Ceph (http://ceph.newdream.net/) but it's > >> currently in alpha status and not recommended for production sites. > >> > >> Nonetheless, maybe you can reach your goals with mfs, keep reading :) > >> > >> > I am sure afer some time, i will have some trouble using client > >> > server which i have to loadbalance them later, but that is the next > >> > step which i dont mind right now. > >> > >> Well, I'm going to actually suggest something along those lines. > >> > >> Consider this: no matter how many storage nodes you will have, the > bottleneck > >> will then be the frontend server (client to mfs) bandwidth, so you _will_ > >> have to do frontend load balancing, might as well do it from the stat and > >> avoid doing it while in production. > >> > >> With than in mind, and to avoid manually increasing files copies on the > >> backend (storage) nodes, you could cache the files on the frontends > (asuming > >> the files won't change often, once uploaded) to speed up serving them. > >> > >> In short: I think your best option is a combination of load balancing/proxy > >> cache and MooseFS with goal >= 2. > >> > >> > >> Best regards, > >> -- > >> Ricardo J. Barberis > >> Senior SysAdmin - I+D > >> Dattatec.com :: Soluciones de Web Hosting > >> Su Hosting hecho Simple..! > >> > >> ------------------------------------------ > >> > >> Nota de confidencialidad: Este mensaje y los archivos adjuntos al mismo > >> son confidenciales, de uso exclusivo para el destinatario del mismo. La > >> divulgación y/o uso del mismo sin autorización por parte de Dattatec.com > >> queda prohibida. Dattatec.com no se hace responsable del mensaje por la > >> falsificación y/o alteración del mismo. > >> De no ser Ud. el destinatario del mismo y lo ha recibido por error, por > >> favor notifique al remitente y elimínelo de su sistema. > >> > >> Confidentiality Note: This message and any attachments (the message) are > >> confidential and intended solely for the addressees. Any unauthorised use > >> or dissemination is prohibited by Dattatec.com. Dattatec.com shall not be > >> liable for the message if altered or falsified. > >> If you are not the intended addressee of this message, please cancel it > >> immediately and inform the sender. > >> > >> Nota de Confidencialidade: Esta mensagem e seus eventuais anexos podem > >> conter dados confidenciais ou privilegiados. Se você os recebeu por engano > >> ou não é um dos destinatários aos quais ela foi endereçada, por favor > >> destrua-a e a todos os seus eventuais anexos ou copias realizadas, > >> imediatamente. > >> É proibida a retenção, distribuição, divulgação ou utilização de quaisquer > >> informações aqui contidas. Por favor, informe-nos sobre o recebimento > >> indevido desta mensagem, retornando-a para o autor. > >> > >> --------------------------------------------------------------------------- > --- > >> > >> _______________________________________________ > >> moosefs-users mailing list > >> moo...@li... > >> https://lists.sourceforge.net/lists/listinfo/moosefs-users > > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |