From: Rob V. <rv...@do...> - 2011-07-22 09:29:50
|
Hi Koos and Rob So I wanted to give you an update on the upcoming new SQL backend and get your input on a few points. Design and Performance So the new SQL backend has a schema fairly similar to the existing backend (so read/write performance is pretty similar) but it has been designed to do all the IO using stored procedures since this allows for the schema to change in future (or alternative schemas to be used) provided that they support the necessary stored procedures. The code has also been designed to take advantage of some new API features to provide reasonable performance for out of memory SPARQL i.e. you can make queries and updates against the database without having to load all the data into memory. This makes the system much more scalable in the long term since you don't need so much memory to make requests against the store though this results in a speed trade off i.e. in-memory query is typically much faster but requires significantly more memory. One of the main advantages of this new out of memory query mechanism is that the memory usage is much lower and there is no time overhead of waiting to load data into memory before answering queries. So particularly for people deploying the backend in environments with limited resources e.g. shared hosting it will be much more usable. Security The new backend also improves on security providing a set of roles for users which determine what actions they can take on the store, the point of this being it makes it much easier to secure user interaction with the store. Currently the roles I'm incorporating are as follows: Admin - Full read/write privileges plus the ability to completely empty the store (though not destroy the database schema)Read/Write - Full read/write privilegesRead/Insert - Full read privileges and can insert new data but not overwrite/delete existing dataRead Only - Full read privileges but no write privelegesIf you guys can think of any other useful roles for the backend please let me know? Note that someone with read/write privileges can effectively empty the store completely by deleting one graph at a time but only the admin can do it in a single command. Additionally only the admin can actually empty the nodes table which has the effect of resetting the store. Code Changes In order to use the new backend you will have to make some minor code/configuration changes which I will provide documentation for nearer the time but hopefully this will not be too disruptive. One thing to note is that as part of a wider strategy regarding reduction of dependencies in the core library the new SQL backend will reside in a separate library dotNetRDF.Data.Sql.dll which allows us to maintain the code separately and add support for additional database back ends (e.g. MySql, Oracle etc) in future without increasing dependencies in the core library. In the initial release only Microsoft SQL Server will be supported, hopefully mySql support will follow in a subsequent release and Oracle eventually since there are issues with getting Oracle licenses for the products required to actually develop and test against. Upgrade Paths The main thing I wanted to get your opinion on was how you would prefer to upgrade your existing databases? The options I am considering are as follows: Provide a standalone tool which can be run at the command line which will migrate from the old format to the new one. Migration could either be in-place or to a new empty database. For in-place migration the old data could be maintained and made restorable for rollbacks if desired?Make the upgrade silent and automatic within the code. This has issues with the fact that it puts the upgrade mechanism into the library and creates a potentially very long delay when you first access your store with the new code as the upgrade has to take place.Do a partial silent upgrade. As the database schemas are not too dissimilar it may be possible to just do a partial upgrade by creating a set of stored procedures within the existing database. This would make it act like a new backend but the performance would likely be worse than the new backend. The advantage of this option is that unlike 2 the upgrade would be very quick and barely noticeable and unlike either of the other options would not modify your existing data in any way so you could continue using old code to talk to it as well as the new code.My personal preference is Option 1 but as users of the existing SQL backend I'd appreciate your feedback on this. Whatever is decided Option 1 will probably be provided in addition to either of the other options - ideally I'd prefer to just have option 1 and not bother with options 2 and 3 but I'm happy to work with whatever you'd prefer. Best Regards, Rob Vesse |