From: <ro...@mm...> - 2012-04-24 22:54:23
|
Wang Jian wrote: > It's not that simple. My response is going to be off-topic for this list, so this is an one-off post (kind-of). I'm sure your team has gone over different architectures, I'm just offering a (hopefully) insightful suggestion. As I understand your problem, you are trying to build something like this: Application Servers Storage Area A B C D .... Z --------> BBKAfs (100's of TB) where BBKAfs = Big-Bad-Kick-Ass-filesystem Since the problem seems to lie in the BBKAfs, I am suggesting that you should do without it altogether by employing sharding. Device a way to partition your users (an algorithmic hash perhaps?) and assign them to different machines. Each machine could hold a subset of users and the relevant data. So instead of the above situation, you end up in something like the following: Application Servers w/Local Storage A (how about 4 to 8TB) B (how about 4 to 8TB) C (how about 4 to 8TB) ... Z .... This way you have no need for a central repository where everything is stored and no need for special filesystems. Each individual server is smaller and so easier to manage (double them for fault tolerance, you are already doing something along these line anyway). No single-point of failure and adopt your scripts that you already must have for handling the Application Servers in the original approach for handling the second case. Of course, this suggestion leaves a lot of issues to be resolved, but is enough to sketch the idea. > > In your scenario, one FS image can only be mounted once, that means a > single server, thus single point of failure. And more seriously, loss of > one chunk of FS image(I mean real one, this happens when > two or more harddisk fail at the same time) leads to the whole image > corruption, then you will lose all data in the FS image. > > Another problem is meta data overhead. Your mounting system will do > filesystem journal, and below, MooseFS will do filesystem journal. You could use ext2 which isn't a journaling fs. > > DB? I am curious if some company really did and be doing that, in the > really large scale (tens and hundereds of 12TB servers) I remember reading that facebook does this (goolge on mysql sharding). Also : Apache Cassandra 1.1 http://www.h-online.com/open/news/item/Apache-Cassandra-1-1-release-comes-to-pass-1557837.html "...According to the ASF, the largest known production cluster of Cassandra carried over 300 terabytes of data spread over 400 machines...." -Stathis > > > δΊ 2012/4/24 17:06, ro...@mm... ει: >> Ken wrote: >>> This way is too Linux. ha. How about growing? >> It should work on any decent *nix system. >> >> Growing? >> >> Option-1 >> How about using resize2fs. >> >> Option-2 >> [sci-fi-ON] >> You could create a PV group out of filesystems-in-a-file. >> Every time you need more space, you just add a new filesystem-in-a-file >> to >> the PV group >> [sci-fi-OFF] >> >> Option-3 >> Use sparse files. >> The file containing the file system could be a sparse file. This way, >> you >> can can specify, for example, a 2TB sparse file that will really only >> consume the actual data as a storage space. >> I don't know how MooseFS handles sparse files though... >> >> Option-4 >> Forget about filesystems (including MooseFS) and use a database for >> storing the photos. >> Use sharding or partioning to keep the database file down to a >> reasonable(whatever this may be) size and improve performance. >> Plenty of options for on-line replication of database data (so no single >> point of failure) >> >> -Stathis >> >>> The business maybe same as Flickr, Facebook photo and DropBox, etc... >>> >>> If small files store resolved, moosefs will be used in more product >>> applications scenarios. >>> >>> Thanks >>> >>> -Ken >>> >>> >>> On Tue, Apr 24, 2012 at 4:10 PM,<ro...@mm...> wrote: >>>> If you're on linux, >>>> >>>> i'd try a filesystem-in-a-file which would be used to store the photo >>>> and >>>> this file itselfs stored in a MooseFS volume. >>>> >>>> For example something along these lines: >>>> >>>> dd if=/dev/zero of=FILENAME bs=1024 count=1 seek=$((2*1024*1024-1)) >>>> mkfs -t ext3 FILENAME >>>> mount -t ext3 -o loop FILENAME MOUNTPOINT >>>> >>>> where MOUNTPOINT is a MooseFS volume. >>>> >>>> -Stathis >>>> >>>> Wang Jian wrote: >>>>> TFS has strict and random URL scheme, so it's difficult to do URL >>>>> based >>>>> tuning. >>>>> >>>>> δΠ2012/4/24 15:11, Davies Liu Ξ΅ ΞΉ : >>>>>> On Tue, Apr 24, 2012 at 2:29 PM, Ken<ken...@gm... >>>>>> <mailto:ken...@gm...>> wrote: >>>>>> >>>>>> On Tue, Apr 24, 2012 at 2:22 PM, Davies >>>>>> Liu<dav...@gm... >>>>>> <mailto:dav...@gm...>> wrote: >>>>>> > Maybe leveldb + MooseFS is better. >>>>>> Is this your mean? >>>>>> leveldb store url >>>>>> moosefs store photo >>>>>> >>>>>> No, store url and photo in leveldb, which use MooseFS as disks. >>>>>> the performance may be not perfect, but enough. >>>>>> If not, TFS from taobao.com<http://taobao.com> may be the better >>>>>> choice. >>>>>> >>>>>> > >>>>>> > On Tue, Apr 24, 2012 at 1:59 PM, Ken<ken...@gm... >>>>>> <mailto:ken...@gm...>> wrote: >>>>>> >> >>>>>> >> We need to store tons of small files(photo), as noticed in >>>>>> faq, >>>>>> file >>>>>> >> count is limited in moosefs, I think bundle small files to >>>>>> a >>>>>> huge >>>>>> file >>>>>> >> maybe work. >>>>>> >> >>>>>> >> Write photos procedure like: >>>>>> >> allocate a huge file >>>>>> >> write head and photo content >>>>>> >> return (huge file, position, size) >>>>>> >> write another head and another photo >>>>>> >> return (huge file, position, size) >>>>>> >> ... >>>>>> >> >>>>>> >> Before read a photo, we should have enough information: >>>>>> huge >>>>>> file, >>>>>> >> position and length, the reading procedure is expected >>>>>> normally. >>>>>> >> >>>>>> >> To read a photo, we should provide an URL, like >>>>>> >> 'http://xxx.com/prefix/huge file/offset/size.jpg' >>>>>> >> >>>>>> >> And to be useful in WEB, build a fastcgi program for >>>>>> read/write >>>>>> access >>>>>> >> to the huge file. >>>>>> >> >>>>>> >> ps: >>>>>> >> * The matchup information photo and url should store >>>>>> outside >>>>>> of >>>>>> mooosefs. >>>>>> >> * Huge file size limited in 2G. >>>>>> >> * Huge file maybe cause race condition. >>>>>> >> >>>>>> >> Is there anyone interested in this? >>>>>> >> or better solution? >>>>>> >> >>>>>> >> -Ken >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> ------------------------------------------------------------------------------ >>>>>> >> Live Security Virtual Conference >>>>>> >> Exclusive live event will cover all the ways today's >>>>>> security >>>>>> and >>>>>> >> threat landscape has changed and how IT managers can >>>>>> respond. >>>>>> Discussions >>>>>> >> will include endpoint security, mobile security and the >>>>>> latest >>>>>> in >>>>>> malware >>>>>> >> threats. >>>>>> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>> >> _______________________________________________ >>>>>> >> moosefs-users mailing list >>>>>> >> moo...@li... >>>>>> <mailto:moo...@li...> >>>>>> >> https://lists.sourceforge.net/lists/listinfo/moosefs-users >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > - Davies >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> - Davies >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Live Security Virtual Conference >>>>>> Exclusive live event will cover all the ways today's security and >>>>>> threat landscape has changed and how IT managers can respond. >>>>>> Discussions >>>>>> will include endpoint security, mobile security and the latest in >>>>>> malware >>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> moosefs-users mailing list >>>>>> moo...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/moosefs-users >>>>> ------------------------------------------------------------------------------ >>>>> Live Security Virtual Conference >>>>> Exclusive live event will cover all the ways today's security and >>>>> threat landscape has changed and how IT managers can respond. >>>>> Discussions >>>>> will include endpoint security, mobile security and the latest in >>>>> malware >>>>> threats. >>>>> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/_______________________________________________ >>>>> moosefs-users mailing list >>>>> moo...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/moosefs-users >>>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. >>>> Discussions >>>> will include endpoint security, mobile security and the latest in >>>> malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> moosefs-users mailing list >>>> moo...@li... >>>> https://lists.sourceforge.net/lists/listinfo/moosefs-users >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. >> Discussions >> will include endpoint security, mobile security and the latest in >> malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> moosefs-users mailing list >> moo...@li... >> https://lists.sourceforge.net/lists/listinfo/moosefs-users > > |