Thread: [Pointrel-discuss] Slashdot | Is the Relational Database Doomed?

Status: Alpha

Brought to you by: paulfernhout

pointrel-discuss

[Pointrel-discuss] Slashdot | Is the Relational Database Doomed?

From: Paul D. F. <pdf...@ku...> - 2009-02-14 13:06:49

http://developers.slashdot.org/article.pl?sid=09/02/13/2026227

"There's an article over on Read Write Web about what the future of 
relational databases looks like when faced with new challenges to its 
dominance from key/value stores, such as SimpleDB, CouchDB, Project 
Voldemort and BigTable. The conclusion suggests that relational databases 
and key value stores aren't really mutually exclusive and instead are 
different tools for different requirements."

http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php

Re: [Pointrel-discuss] Slashdot | Is the Relational Database Doomed?

From: Paul D. F. <pdf...@ku...> - 2009-02-14 13:16:17

Links and summaries:

Amazon SimpleDB:
  http://aws.amazon.com/simpledb/
"""
Amazon SimpleDB provides a simple web services interface to create and store 
multiple data sets, query your data easily, and return the results. You 
organize your structured data into domains and can run queries across all of 
the data stored in a particular domain. Domains are comprised of items, and 
items are described by attribute-value pairs. To understand these elements, 
consider the metaphor of data stored in a spreadsheet table. An Amazon 
SimpleDB domain is like a worksheet, items are like rows of data, attributes 
are like column headers, and values are the data entered in each of the 
cells. However unlike a spreadsheet, Amazon SimpleDB allows for multiple 
values to be associated with each “cell” (e.g., for item “123,” the 
attribute “color” can have both value “blue” and value “red”). Additionally, 
in Amazon SimpleDB, each item can have its own unique set of associated 
attributes (e.g., item “123” might have attributes “description” and “color” 
whereas item “789” has attributes “description,” “color” and “material”). 
Amazon SimpleDB automatically indexes your data, making it easy to quickly 
find the information that you need. There is no need to pre-define a schema 
or change a schema if new data is added later.
"""

Project Voldemort:
   http://project-voldemort.com/
"""
Voldemort is not a relational database, it does not attempt to satisfy 
arbitrary relations while satisfying ACID properties. Nor is it an object 
database that attempts to transparently map object reference graphs. Nor 
does it introduce a new abstraction such as document-orientation. It is 
basically just a big, distributed, persistent, fault-tolerant hash table. 
For applications that can use an O/R mapper like active-record or hibernate 
this will provide horizontal scalability and much higher availability but at 
great loss of convenience. For large applications under internet-type 
scalability pressure, a system may likely consists of a number of 
functionally partitioned services or apis, which may manage storage 
resources across multiple data centers using storage systems which may 
themselves be horizontally partitioned. For applications in this space, 
arbitrary in-database joins are already impossible since all the data is not 
available in any single database. A typical pattern is to introduce a 
caching layer which will require hashtable semantics anyway. For these 
applications Voldemort offers a number of advantages:
     * Voldemort combines in memory caching with the storage system so that 
a separete caching tier is not required (instead the storage system itself 
is just fast.
     * Unlike MySQL replication, both reads and writes scale horizontally
     * Data partioning is transparent, and allows for cluster expansion 
without rebalancing all data
     * Data replication and placement is decided by a simple API to be able 
to accomadate a wide range of application specific strategies
     * The storage layer is completely mockable so development and unit 
testing can be done against a throw-away in-memory storage system without 
needing a real cluster (or even a real storage system) for simple testing
The source code is available under the Apache 2.0 license. We are actively 
looking for contributors so if you have ideas, code, bug reports, or fixes 
you would like to contribute please do so.
"""

Apache CouchDB:
   http://couchdb.apache.org/
"""
Apache CouchDB is a distributed, fault-tolerant and schema-free 
document-oriented database accessible via a RESTful HTTP/JSON API. Among 
other features, it provides robust, incremental replication with 
bi-directional conflict detection and resolution, and is queryable and 
indexable using a table-oriented view engine with JavaScript acting as the 
default view definition language. CouchDB is written in Erlang, but can be 
easily accessed from any environment that provides means to make HTTP 
requests. There are a multitude of third-party client libraries that make 
this even easier for a variety of programming languages and environments.
"""

Google Bigtable:
   http://labs.google.com/papers/bigtable.html
"Bigtable is a distributed storage system for managing structured data that 
is designed to scale to a very large size: petabytes of data across 
thousands of commodity servers. Many projects at Google store data in 
Bigtable, including web indexing, Google Earth, and Google Finance. These 
applications place very different demands on Bigtable, both in terms of data 
size (from URLs to web pages to satellite imagery) and latency requirements 
(from backend bulk processing to real-time data serving). Despite these 
varied demands, Bigtable has successfully provided a flexible, 
high-performance solution for all of these Google products. In this paper we 
describe the simple data model provided by Bigtable, which gives clients 
dynamic control over data layout and format, and we describe the design and 
implementation of Bigtable. "

And there is Soprano (used by Nepomuk) or other RDF triple engines:
   http://soprano.sourceforge.net/
"Soprano (formerly known as QRDF) is a library which provides a highly 
usable object-oriented C++/Qt4 framework for RDF data. It uses different RDF 
storage solutions as backends through a simple plugin system. Soprano is 
targetted at desktop applications that need a RDF data storage solution. It 
has been optimized for easy usage and simplicity."

I don't see an obvious mention of the latest Pointrel system feature of 
reifying transactions.

--Paul Fernhout

Paul D. Fernhout wrote:
> http://developers.slashdot.org/article.pl?sid=09/02/13/2026227
> 
> "There's an article over on Read Write Web about what the future of 
> relational databases looks like when faced with new challenges to its 
> dominance from key/value stores, such as SimpleDB, CouchDB, Project 
> Voldemort and BigTable. The conclusion suggests that relational databases 
> and key value stores aren't really mutually exclusive and instead are 
> different tools for different requirements."
> 
> http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php

Re: [Pointrel-discuss] Slashdot | Is the Relational Database Doomed?

From: Paul D. F. <pdf...@ku...> - 2009-02-14 13:22:28

 From the Bigtable document there:
"""
A Bigtable is a sparse, distributed, persistent multi-dimensional sorted 
map. The map is indexed by a row key, column key, and a timestamp; each 
value in the map is an uninterpreted array of bytes. (row:string, 
column:string, time:int64) → string
... Each cell in a Bigtable can contain multiple versions of
the same data; these versions are indexed by timestamp.
Bigtable timestamps are 64-bit integers.
"""

So, basically objects (rows), with attributes (columns), with versions, 
similar in some ways to the earliest Pointrel versions from 25 years ago. Of 
course, they have a distributed fancy implementation. :-)

--Paul Fernhout

Paul D. Fernhout wrote:
> Google Bigtable:
>    http://labs.google.com/papers/bigtable.html
> "Bigtable is a distributed storage system for managing structured data that 
> is designed to scale to a very large size: petabytes of data across 
> thousands of commodity servers. Many projects at Google store data in 
> Bigtable, including web indexing, Google Earth, and Google Finance. These 
> applications place very different demands on Bigtable, both in terms of data 
> size (from URLs to web pages to satellite imagery) and latency requirements 
> (from backend bulk processing to real-time data serving). Despite these 
> varied demands, Bigtable has successfully provided a flexible, 
> high-performance solution for all of these Google products. In this paper we 
> describe the simple data model provided by Bigtable, which gives clients 
> dynamic control over data layout and format, and we describe the design and 
> implementation of Bigtable. "

Re: [Pointrel-discuss] Slashdot | Is the Relational Database Doomed?

From: Paul D. F. <pdf...@ku...> - 2009-02-14 14:08:17

One comment by someone posting to slashdot "photon317 (208409)":
   http://developers.slashdot.org/comments.pl?sid=1127539&cid=26850223
"Yes, these newer simple key/value databases like BigTable and CouchDB are 
effectively a subset of RDBMS functionality, so of course the same thing can 
be implemented relationally by just not using features. The reason these 
projects have taken off is that the relational features being skipped 
comprise most of the complexity of an RDBMS. Without them, it's relatively 
trivial to write new database engines from scratch instead of re-using 
MySQL, PostgreSQL, and so-on. These new feature-poor rewrites can take on 
many challenges that are harder for the big relational guys, like stellar 
performance on huge datasets, and being truly distributed in nature."

Although I've never found that to quite be true, at least since years ago 
there were obvious limits you needed to set to field lengths on strings if 
you wanted to index on them in available relational databases.

Another comment by "thammoud (193905)":
   http://developers.slashdot.org/comments.pl?sid=1127539&cid=26849587
"Leave us RDBMS dinosaurs alone. String Name/Value pairs, that is a great 
innovation. In other news, Sun will be dropping all types from the Java 
object system and rely on the VOID type. Idiots."

That last reason is why the current version of the Pointrel System has the 
equivalent of namespaces or types to specify how to interpret the string. 
Although, that could be also coded by convention into the first part of all 
strings.

Still, a RDBMS can do a good job of documenting intent in the table 
structure. So there are tradeoffs, of flexibility risking "flabbiness" as 
well as sometimes a need to understand how the code that uses the data 
works. On the other hand, any complex database like Mediwiki uses has 
several tables and can only be understood in terms of how every table 
relates to each other, and the use of a table can be non-obvious.

--Paul Fernhout

Paul D. Fernhout wrote:
> http://developers.slashdot.org/article.pl?sid=09/02/13/2026227
> 
> "There's an article over on Read Write Web about what the future of 
> relational databases looks like when faced with new challenges to its 
> dominance from key/value stores, such as SimpleDB, CouchDB, Project 
> Voldemort and BigTable. The conclusion suggests that relational databases 
> and key value stores aren't really mutually exclusive and instead are 
> different tools for different requirements."
> 
> http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php