Re: [Kosmosfs-users] KosmosFS vs HDFS
Status: Alpha
Brought to you by:
sriramsrao
|
From: Sriram R. <sri...@gm...> - 2008-02-29 05:54:37
|
Dave, I got some preliminary performance #'s. The setup: - I use kfs-0.1.2 to start with; binaries compiled in release mode - I used hadoop-trunk checked out as of 2/27/08 - I use KfsPerfWriter/KfsPerfReader to write/read data from KFS - I wrote simple Writer/Reader classes for HDFS which use the OutputStream/InputStream APIs to get at data - I have a cluster of 5 chunkserver/datanodes and 1 metaserver/namenode - All nodes are connected via a single GigE switch - All nodes are running Linux (FC5) with an ext2 FS; the partition had over 200G free space My #'s are from an experiment of writing/reading 500MB of data from the DFS. - Chunks are sized at 64MB - For writes, there is 3-way replication of each chunk. - The node on which the client program is run (reader/writer), is the same as the one on which the namenode/metaserver run. There are no datanodes/chunkservers on the node on which the client is run. This helps determine how the two systems deal with network I/O. Perf. #'s: - Write: HDFS: 166Mbps; KFS-0.1.2: 125Mbps - Read: HDFS: 275Mbps; KFS-0.1.2: 402Mbps The reason why HDFS is faster for writes: - In HDFS, when a client pushes data to the 3 datanodes, the push is chained. That is, the client pushes to the first data node, which in turn forwards to the next, and so on. Please see Hadoop site for details on data fwd'ing. - In KFS-0.1.2., the client pushes the data to the 3 datanodes. This constrains write performance since bulk of the I/O is in pushing data (where each chunk is 64MB). I am looking into chaining the data pushes similar to HDFS/GFS style. This should significantly improve performance and make full use of the network bandwidth. These changes will be available in a future release. Sriram On Mon, Feb 25, 2008 at 1:52 PM, David Carter <da...@ca...> wrote: > Sriram, > > Thanks for the quick response. I had seen the KFS-Hadoop integration, > and am trying to determine which FS to run Hadoop-core on. > > It will be interesting to see the results of the performance work. > > Regards, > David > > > > On Mon, Feb 25, 2008 at 4:48 PM, Sriram Rao <sri...@gm...> wrote: > > While both are functionally equivalent, there are some differences > > between KFS and HDFS (HDFS as of today): > > - in HDFS, files are visible only on close; in KFS this isn't the case > > -- impact: for DB commit logs, HDFS makes it hard. you write a > > commit log record and then close the file; this has the downside of > > creating lots of small files in the system. > > - HDFS doesn't have a sync() call; KFS does > > - Performance differences: I am currently trying do some > > benchmarking; will get back to you in a week or so. > > > > Aside...just as HDFS works with Hadoop for map-reduce, so too does > > KFS; we have integrated KFS as a layer in the Hadoop filesystem > > interfaces. > > > > Sriram > > > > > > > > On Mon, Feb 25, 2008 at 1:43 PM, David Carter <da...@ca...> wrote: > > > Can someone provide some guidelines on when someone would choose to > > > use KosmosFS vs Hadoop's HDFS? > > > > > > Are there significant functional differences that would cause someone > > > to choose one over the other for a specific application? > > > > > > Are there significant performance differences? My gut reaction is that > > > C++ is better suited than Java for developing a distributed file > > > system, but I'm wondering if there are any performance measurements to > > > back this up, preferably benchmarks run on the same cluster? > > > > > > -- > > > David Carter > > > da...@ca... > > > > > > ------------------------------------------------------------------------- > > > This SF.net email is sponsored by: Microsoft > > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > > _______________________________________________ > > > Kosmosfs-users mailing list > > > Kos...@li... > > > https://lists.sourceforge.net/lists/listinfo/kosmosfs-users > > > > > > > > > > > > > -- > > > David Carter > da...@ca... > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Kosmosfs-users mailing list > Kos...@li... > https://lists.sourceforge.net/lists/listinfo/kosmosfs-users > |