Thread: [Pointrel-discuss] Slashdot | Is the Relational Database Doomed?
Status: Alpha
Brought to you by:
paulfernhout
From: Paul D. F. <pdf...@ku...> - 2009-02-14 13:06:49
|
http://developers.slashdot.org/article.pl?sid=09/02/13/2026227 "There's an article over on Read Write Web about what the future of relational databases looks like when faced with new challenges to its dominance from key/value stores, such as SimpleDB, CouchDB, Project Voldemort and BigTable. The conclusion suggests that relational databases and key value stores aren't really mutually exclusive and instead are different tools for different requirements." http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php |
From: Paul D. F. <pdf...@ku...> - 2009-02-14 13:16:17
|
Links and summaries: Amazon SimpleDB: http://aws.amazon.com/simpledb/ """ Amazon SimpleDB provides a simple web services interface to create and store multiple data sets, query your data easily, and return the results. You organize your structured data into domains and can run queries across all of the data stored in a particular domain. Domains are comprised of items, and items are described by attribute-value pairs. To understand these elements, consider the metaphor of data stored in a spreadsheet table. An Amazon SimpleDB domain is like a worksheet, items are like rows of data, attributes are like column headers, and values are the data entered in each of the cells. However unlike a spreadsheet, Amazon SimpleDB allows for multiple values to be associated with each “cell” (e.g., for item “123,” the attribute “color” can have both value “blue” and value “red”). Additionally, in Amazon SimpleDB, each item can have its own unique set of associated attributes (e.g., item “123” might have attributes “description” and “color” whereas item “789” has attributes “description,” “color” and “material”). Amazon SimpleDB automatically indexes your data, making it easy to quickly find the information that you need. There is no need to pre-define a schema or change a schema if new data is added later. """ Project Voldemort: http://project-voldemort.com/ """ Voldemort is not a relational database, it does not attempt to satisfy arbitrary relations while satisfying ACID properties. Nor is it an object database that attempts to transparently map object reference graphs. Nor does it introduce a new abstraction such as document-orientation. It is basically just a big, distributed, persistent, fault-tolerant hash table. For applications that can use an O/R mapper like active-record or hibernate this will provide horizontal scalability and much higher availability but at great loss of convenience. For large applications under internet-type scalability pressure, a system may likely consists of a number of functionally partitioned services or apis, which may manage storage resources across multiple data centers using storage systems which may themselves be horizontally partitioned. For applications in this space, arbitrary in-database joins are already impossible since all the data is not available in any single database. A typical pattern is to introduce a caching layer which will require hashtable semantics anyway. For these applications Voldemort offers a number of advantages: * Voldemort combines in memory caching with the storage system so that a separete caching tier is not required (instead the storage system itself is just fast. * Unlike MySQL replication, both reads and writes scale horizontally * Data partioning is transparent, and allows for cluster expansion without rebalancing all data * Data replication and placement is decided by a simple API to be able to accomadate a wide range of application specific strategies * The storage layer is completely mockable so development and unit testing can be done against a throw-away in-memory storage system without needing a real cluster (or even a real storage system) for simple testing The source code is available under the Apache 2.0 license. We are actively looking for contributors so if you have ideas, code, bug reports, or fixes you would like to contribute please do so. """ Apache CouchDB: http://couchdb.apache.org/ """ Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API. Among other features, it provides robust, incremental replication with bi-directional conflict detection and resolution, and is queryable and indexable using a table-oriented view engine with JavaScript acting as the default view definition language. CouchDB is written in Erlang, but can be easily accessed from any environment that provides means to make HTTP requests. There are a multitude of third-party client libraries that make this even easier for a variety of programming languages and environments. """ Google Bigtable: http://labs.google.com/papers/bigtable.html "Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable. " And there is Soprano (used by Nepomuk) or other RDF triple engines: http://soprano.sourceforge.net/ "Soprano (formerly known as QRDF) is a library which provides a highly usable object-oriented C++/Qt4 framework for RDF data. It uses different RDF storage solutions as backends through a simple plugin system. Soprano is targetted at desktop applications that need a RDF data storage solution. It has been optimized for easy usage and simplicity." I don't see an obvious mention of the latest Pointrel system feature of reifying transactions. --Paul Fernhout Paul D. Fernhout wrote: > http://developers.slashdot.org/article.pl?sid=09/02/13/2026227 > > "There's an article over on Read Write Web about what the future of > relational databases looks like when faced with new challenges to its > dominance from key/value stores, such as SimpleDB, CouchDB, Project > Voldemort and BigTable. The conclusion suggests that relational databases > and key value stores aren't really mutually exclusive and instead are > different tools for different requirements." > > http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php |
From: Paul D. F. <pdf...@ku...> - 2009-02-14 13:22:28
|
From the Bigtable document there: """ A Bigtable is a sparse, distributed, persistent multi-dimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. (row:string, column:string, time:int64) → string ... Each cell in a Bigtable can contain multiple versions of the same data; these versions are indexed by timestamp. Bigtable timestamps are 64-bit integers. """ So, basically objects (rows), with attributes (columns), with versions, similar in some ways to the earliest Pointrel versions from 25 years ago. Of course, they have a distributed fancy implementation. :-) --Paul Fernhout Paul D. Fernhout wrote: > Google Bigtable: > http://labs.google.com/papers/bigtable.html > "Bigtable is a distributed storage system for managing structured data that > is designed to scale to a very large size: petabytes of data across > thousands of commodity servers. Many projects at Google store data in > Bigtable, including web indexing, Google Earth, and Google Finance. These > applications place very different demands on Bigtable, both in terms of data > size (from URLs to web pages to satellite imagery) and latency requirements > (from backend bulk processing to real-time data serving). Despite these > varied demands, Bigtable has successfully provided a flexible, > high-performance solution for all of these Google products. In this paper we > describe the simple data model provided by Bigtable, which gives clients > dynamic control over data layout and format, and we describe the design and > implementation of Bigtable. " |
From: Paul D. F. <pdf...@ku...> - 2009-02-14 14:08:17
|
One comment by someone posting to slashdot "photon317 (208409)": http://developers.slashdot.org/comments.pl?sid=1127539&cid=26850223 "Yes, these newer simple key/value databases like BigTable and CouchDB are effectively a subset of RDBMS functionality, so of course the same thing can be implemented relationally by just not using features. The reason these projects have taken off is that the relational features being skipped comprise most of the complexity of an RDBMS. Without them, it's relatively trivial to write new database engines from scratch instead of re-using MySQL, PostgreSQL, and so-on. These new feature-poor rewrites can take on many challenges that are harder for the big relational guys, like stellar performance on huge datasets, and being truly distributed in nature." Although I've never found that to quite be true, at least since years ago there were obvious limits you needed to set to field lengths on strings if you wanted to index on them in available relational databases. Another comment by "thammoud (193905)": http://developers.slashdot.org/comments.pl?sid=1127539&cid=26849587 "Leave us RDBMS dinosaurs alone. String Name/Value pairs, that is a great innovation. In other news, Sun will be dropping all types from the Java object system and rely on the VOID type. Idiots." That last reason is why the current version of the Pointrel System has the equivalent of namespaces or types to specify how to interpret the string. Although, that could be also coded by convention into the first part of all strings. Still, a RDBMS can do a good job of documenting intent in the table structure. So there are tradeoffs, of flexibility risking "flabbiness" as well as sometimes a need to understand how the code that uses the data works. On the other hand, any complex database like Mediwiki uses has several tables and can only be understood in terms of how every table relates to each other, and the use of a table can be non-obvious. --Paul Fernhout Paul D. Fernhout wrote: > http://developers.slashdot.org/article.pl?sid=09/02/13/2026227 > > "There's an article over on Read Write Web about what the future of > relational databases looks like when faced with new challenges to its > dominance from key/value stores, such as SimpleDB, CouchDB, Project > Voldemort and BigTable. The conclusion suggests that relational databases > and key value stores aren't really mutually exclusive and instead are > different tools for different requirements." > > http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php |