CSQL, is suite of products which shall be used in two ways
- Stand Alone Main memory Database
- Client side /Middle-tier caching platform for any database system.
It has all the features professional developers and architects need to build highly scalable, multi tier high performance and highly available systems.
The primary goal of DataCache is to improve throughput of applications whose bottleneck is database access. This is achieved by loading data from currently used database system(now on, this will be called as target database) into another main memory database which is many many times faster than the usual disk based database systems. This main memory database will sit on the same host where database client applications are running, thereby reduces the network overhead involved in database access.
Main memory databases are times faster than disk based database systems as, all the data is available in physical memory and avoids the buffer manager overhead which is found in disk based database systems. Moreover data access algorithms can work efficiently as compared to traditional disk based algorithms. This led us to design a cache for the traditional database system tables using a main memory database system.
Architecture of CSQLCache
Data stored in the target database(any disk based database system) is cached at the client side using a main memory database system, CSQL. This main memory database will be connected to the target database through the component, DB Adaptor. For most of the databases, generic adaptor which accesses the target database using ODBC interface will be used. If target database provides any alternative fast native APIs, we shall develop an adaptor for those APIs and connect CSQL to the target database through that adaptor. Gateway component manages the data cache. It determines which tables are present in the cache and which are not there. If the table is not present in the cache, it sends the query to target database for execution. There will not be any application changes except the JDBC or ODBC driver configuration change, Gateway takes care of finding the location of the table and get it executed by appropriate database engine. A CLI(Command Line Interace) will be provided to the application, so that it will be able to load hot tables into this cache based on their requirement.
CSQL is compact main memory database SQL Engine which supports limited set of features and gives ultra fast response to database queries. It supports only limited features which is used by most of the real time applications which includes INSERT, UPDATE, DELETE on single table and SELECT with local predicates on single table. It provides multiple interface to client
- DB API - to access the relational storage engine
- SQL API - to access SQL Engine
- ODBC - standard interface for C/C++ applications
- JDBC - standard interface for Java applications
Unsupported functionality of CSQL is compensated by having target database perform those operations. This is done by the gateway module which decides what needs to be handled by CSQL and which queries needs to be handled by target database. It acts as a router, simple SELECT queries which involve only single table are routed to CSQL module and complex queries like joins and subqueries are routed to the target database. All non SELECT DML statements are executed at both CSQL and target database so that cache and target database are in sync.
This data synchronization shall be configured in two modes
- SYNC - Statements are executed synchrounously during the appropriate operation at the target database.
- ASYNC - Statements are executed asysnchrously by separate server process called CachePropagator.
There will not be much performance degradation in using SYNC mode, as non SELECT DML operations will be more than 100 times faster than the target database. It takes around 3 millisecs for single tuple insert of 1 KB size in traditional disk based database systems with the current hard disk speed of 15K rpm. CSQL is expected to take less than 30 microsecs for doing the same.
ASYNC mode shall be used by applications which require high throughput for INSERT operation like stock market feeding.
In traditional replication systems, tuple level logs are propagated to other sites which involves heavy network traffic in cases like "delete * from table" statements. CSQL propagates the update at SQL level in case of ASYNC mode, which reduces the traffic as well as improves the overall throughput of the system.