Thread: [Cppcms-users] high performance database
Brought to you by:
artyom-beilis
From: Aris S. <ari...@gm...> - 2010-06-19 04:38:02
|
Hi, I agree with you in choosing BerkeleyDB in CppCMS. BerkeleyDB was used in some database or distributed storage, like Amazon Dynamo, Voldemort Project, and HypergraphDB. Have you hear about voltDB? voltDB (relational & key-value DB) use ONLY memory to do ACID and transactional operation, and scalable to handle ten of thousands transaction per second. It provide in memory scalability that another in memory database didn't provide. They have 50 times performance compared to well known commercial database. They release in GPL v3. How do you think? regards, -Aris |
From: Artyom <art...@ya...> - 2010-06-19 08:26:29
|
Hello, First of all CppCMS does not limit its users of any specific DB. Use anything you feel well. > I agree with you in choosing BerkeleyDB in CppCMS. > BerkeleyDB was used > in some database or distributed storage, like Amazon > Dynamo, Voldemort > Project, and HypergraphDB. Have you hear about voltDB? > At the beggining of CppCMS I wanted to use BDB as my primary DB backend. If fact first version of CppCMS blog was running on BDB. But I had dropped the idea of BerkeleyDB for several reasons. 1. BDB was really good for key-value storage, but in more complex cases with multiple-indexes I found that traditional SQL DB sometimes match it or even outperform it. For example, I searched for session-storage-DB and found Sqlite3 faster with two indexes then BDB. 2. BDB has scalability issues. It is not client-server solution so if you need to access same DB for writing from several nodes it becomes very complicated. 3. BDB is very hard to maintain and update its schema, much harder then SQL. 4. BDB has API is total disaster for normal programmer. Hard to understand use and maintain. Finally I decided to drop it as it didn't give me too much advantage (if at all) and had lot's of disadvantages. Of course, if the data storage you are looking for is Key-Value only then it may be good idea. I think it is worth to take a look on memcachedb for this purpose. So at this point my own applications - blog, wikipp run on top of MySQL that is quite fast. > voltDB (relational & key-value DB) use ONLY memory to > do ACID and > transactional operation, and scalable to handle ten of > thousands > transaction per second. It provide in memory scalability > that another > in memory database didn't provide. They have 50 times > performance > compared to well known commercial database. They release in > GPL v3. Sounds interesting. But from what I see: - It works in memory... How does it scales when DB size grows to hundreds of GB? - Its D requires running in hot cluster so each transaction should be committed to other cluster to provide "D". So you always need at least 2 nodes. What happens if one crashes? How it provides D then? Does it really faster then committing data to high quality data-storage? To be honest, it sounds really interesting. But as I said, CppCMS allows you to use any storage you want. Artyom |
From: Aris S. <ari...@gm...> - 2010-06-19 14:32:57
|
> So at this point my own applications - blog, wikipp run on top > of MySQL that is quite fast. Yes, it is fast, I'm impressed. Some days ago, I still doubt in choosing who is faster between C++ and Erlang (as I found possibility to develop web with Erlang), and I found something interesting that C++ win with 19 times faster and less memory in game benchmark. I read about it in http://www.scribd.com/doc/29113347/Numerical-Comparison-Between-Erlang-and-C and http://shootout.alioth.debian.org/u32q/which-programming-languages-are-fastest.php?calc=calculate&gpp=on&gcc=on&java=on&javaxint=on&jruby=on How do you think about Erlang? ... ... > > Sounds interesting. But from what I see: > > - It works in memory... How does it scales when DB size grows > to hundreds of GB? VoltDB running in 64-bit environment, so we can use bigger size memory and VoltDB will partitions and distribute our data into cluster. How VoltDB distributes our data depends on how we configure "how many replica per partition" in our DB. It's named k-safety. So if or data grow, we just add new node to cluster (-add memory size). > > - Its D requires running in hot cluster so each transaction should > be committed to other cluster to provide "D". So you always need > at least 2 nodes. Yes you right. VoltDB also provide periodic snapshot into disk to ensure data persistence, but it's not required for ACID transaction. > > What happens if one crashes? How it provides D then? With k-safety, we can choose availability level we want. k=1 means, VoltDB will save copy per partition in 2 node, k=2 in 3 node, etc.. So if we have at least 2 replica, it will durable. > > Does it really faster then committing data to high quality > data-storage? > It will be faster, because, beside VoltDB doesn't suffered from disk logging, they remove locking, buffer management, and latching. They remove them because they are needed only in disk base database. From this technique VoltDB state will save 97% CPU works for "true data processing". They replace DB locking into simple parallel queue transaction(a queue per partition) to get benefit from multi-core CPU. If we have 4 core CPU, then we can partitions or data into 4 partitions in a node. You can try it with voter sample application they provided, and you will get 1000 transactions per second as a slow transaction :) I copy their benchmark bellow for you: > Hardware/Software used for the benchmark: > Servers: (6) Dell R610 (each with 2 x Xeon 5530 CPU, 48GB DDR3 @ 1333Mhz, single 1Gb NIC, CentOS 5.4 64-bit) > Client: (1) Dell R610: same specification as Servers > Switch: (1) Dell PowerConnect 6248, 48 1Gb ports > VoltDB: (6) partitions per server > Benchmark Plan: > 1. Determine how much capacity this configuration can support without k-safety (run the client without rate limiting), measure throughput and latency. > 2. Rate limit the client ~10% below the throughput of #1, measure throughput and latency. > 3. Finally, change the cluster to be k-safe (k=1) and rerun the client at 50% of the throughput from #2, measure throughput and latency. > Results: > 1. k=0, no rate limit on client, measured 531,685 transactions per second (TPS) @ 350.11ms latency > 2. k=0, client rate limited to 500,000 TPS, measured 490,674 TPS @ 9.43ms latency > 3. k=1, client rate limited to 250,000 TPS, measured 249,162 TPS @ 7.63ms latency |
From: Artyom <art...@ya...> - 2010-06-19 15:29:47
|
> > So at this point my own > > applications - blog, wikipp run on top > > of MySQL that is quite fast. > > Yes, it is fast, I'm impressed. Some days ago, I still > doubt in > choosing who is faster between C++ and Erlang (as I found > possibility > to develop web with Erlang), and I found something > interesting that > C++ win with 19 times faster and less memory in game > benchmark. Never believe synthetic benchmarks. They are quite useless for real world applications. Compare real world applications that do same jobs. > How do you think about Erlang? I'm not familiar with it. But as it is dynamically typed language it will be always significantly slower then any statically typed one. > > Results: > > 1. k=0, no rate limit on client, measured 531,685 > transactions per second (TPS) @ 350.11ms latency > > 2. k=0, client rate limited to 500,000 TPS, measured > 490,674 TPS @ 9.43ms latency > > 3. k=1, client rate limited to 250,000 TPS, measured > 249,162 TPS @ 7.63ms latency > What kind of transactions? What kind of schema, etc. I've read plenty good benchmarks for BDB that where... not so realistic in real world. Same as before. Compare real world applications. P.S.: If k=0 this is quite not interesting as it does not provide D of ACID. In any case... DB choice is choice specific for every singe case. Artyom |
From: Aris S. <ari...@gm...> - 2010-06-19 17:12:12
|
> What kind of transactions? What kind of schema, etc. For the detailed description in https://community.voltdb.com/node/48 They have tpc-like benchmark (compared to MySQL and RAID DBMS) in https://community.voltdb.com/node/134 > I've read plenty good benchmarks for BDB that where... not so realistic > in real world. > > Same as before. Compare real world applications. > > P.S.: If k=0 this is quite not interesting as it does not provide D of > ACID. Maybe that's for show their performance is predictable and with k=1, the latency drop only 2ms. > > In any case... DB choice is choice specific for every singe case. Yes, you right. VoltDB is match for system with high load insert/update and not match for long time running transaction. Back to CppCMS, how if we add a regression benchmark to cppCMS like use in drizzle? https://lists.launchpad.net/drizzle-benchmark/ and http://drizzle.org/performance/ It is not real world benchmark, but I think it is useful to measure our refactoring value in CppCMS, and to detect if we made mistake in some area of refactorings we did. P.S: drizzle if a refactoring of MySQL. -I'm sorry of my bad english. Aris |