[dotNetRDF-Develop] Update on progress with new SQL backend

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Koos and Rob

So I wanted to give you an update on the upcoming new SQL backend and get 
your input on a few points.

Design and Performance

So the new SQL backend has a schema fairly similar to the existing backend 
(so read/write performance is pretty similar) but it has been designed to 
do all the IO using stored procedures since this allows for the schema to 
change in future (or alternative schemas to be used) provided that they 
support the necessary stored procedures.

The code has also been designed to take advantage of some new API features 
to provide reasonable performance for out of memory SPARQL i.e. you can 
make queries and updates against the database without having to load all 
the data into memory.  This makes the system much more scalable in the long 
term since you don't need so much memory to make requests against the store 
though this results in a speed trade off i.e. in-memory query is typically 
much faster but requires significantly more memory.  

One of the main advantages of this new out of memory query mechanism is 
that the memory usage is much lower and there is no time overhead of 
waiting to load data into memory before answering queries.  So particularly 
for people deploying the backend in environments with limited resources 
e.g. shared hosting it will be much more usable.

Security

The new backend also improves on security providing a set of roles for 
users which determine what actions they can take on the store, the point of 
this being it makes it much easier to secure user interaction with the 
store.  Currently the roles I'm incorporating are as follows:
Admin - Full read/write privileges plus the ability to completely empty the 
store (though not destroy the database schema)Read/Write - Full read/write 
privilegesRead/Insert - Full read privileges and can insert new data but 
not overwrite/delete existing dataRead Only - Full read privileges but no 
write privelegesIf you guys can think of any other useful roles for the 
backend please let me know?

Note that someone with read/write privileges can effectively empty the 
store completely by deleting one graph at a time but only the admin can do 
it in a single command.  Additionally only the admin can actually empty the 
nodes table which has the effect of resetting the store.

Code Changes

In order to use the new backend you will have to make some minor 
code/configuration changes which I will provide documentation for nearer 
the time but hopefully this will not be too disruptive.

One thing to note is that as part of a wider strategy regarding reduction 
of dependencies in the core library the new SQL backend will reside in a 
separate library dotNetRDF.Data.Sql.dll which allows us to maintain the 
code separately and add support for additional database back ends (e.g. 
MySql, Oracle etc) in future without increasing dependencies in the core 
library.  In the initial release only Microsoft SQL Server will be 
supported, hopefully mySql support will follow in a subsequent release and 
Oracle eventually since there are issues with getting Oracle licenses for 
the products required to actually develop and test against.

Upgrade Paths

The main thing I wanted to get your opinion on was how you would prefer to 
upgrade your existing databases?  The options I am considering are as 
follows:
Provide a standalone tool which can be run at the command line which will 
migrate from the old format to the new one.  Migration could either be 
in-place or to a new empty database.  For in-place migration the old data 
could be maintained and made restorable for rollbacks if desired?Make the 
upgrade silent and automatic within the code.  This has issues with the 
fact that it puts the upgrade mechanism into the library and creates a 
potentially very long delay when you first access your store with the new 
code as the upgrade has to take place.Do a partial silent upgrade.  As the 
database schemas are not too dissimilar it may be possible to just do a 
partial upgrade by creating a set of stored procedures within the existing 
database.  This would make it act like a new backend but the performance 
would likely be worse than the new backend.  The advantage of this option 
is that unlike 2 the upgrade would be very quick and barely noticeable and 
unlike either of the other options would not modify your existing data in 
any way so you could continue using old code to talk to it as well as the 
new code.My personal preference is Option 1 but as users of the existing 
SQL backend I'd appreciate your feedback on this.  Whatever is decided 
Option 1 will probably be provided in addition to either of the other 
options - ideally I'd prefer to just have option 1 and not bother with 
options 2 and 3 but I'm happy to work with whatever you'd prefer.

Best Regards,

Rob Vesse