rajiv reddy - 2014-09-11

I am working on a solution to cluster servers in two different data centers. if this post looks lengthy then jump directly to proposed solution and questions

Each datacenter has the following
1)4-5 tomcat servers (clustered using jgroups)
2)ORACLE RAC in one datacenter and Oracle databases in other datacenters
3)Latency between datacenters is around 50 ms

My requirement is to provide high availability solution which should be fault tolerant even during disaster. I am able to design solution for clustering tomcat servers (including distributed caching,search etc) but struck at designing database solution.

Oracle golden gate is the quick solution but this doesn't seem to be optimal solution for our application with 1000+ tables . I came up with HA-JDBC which seems to provide better solution for our use case.

HA-JDBC may require long planned outages for activating databases (our db has more than 1000+ tables and around 100GB of data) .All the existing synchronization strategies does full table scan of all the tables which may be time consuming for our usecase. I am thinking to plug the following logic.
Solution should leave database in inconsistent state

Proposed solution:
1) Deactivation
Create datasource wrapper and execute the DML/DDL statements on local regional database and perform asynchronous operation on all other distributed databases parallely under the scope of same transaction
a) On error in local regional db, The complete transaction should be rolled back ( i am assuming Ha-JDBC supports this)
b) On error in other regional db, log the entire statement into local regional database along with all the databases on which statement failed to execute ( This allows partial statement execution on this DB) . De-active all the crashed databases (This is maintained in new list in database cluster for logging into redo log until database is activated) .

Table structure
Ha-jdbc-redo-log(id, statement , delimited separated database id's)

Note: This doesn't work for new databases or databases whose id's are changed after crash ( This scenario is very rare and can be ignored for my use case)

2)activation
provision to execute the below logic by locking databases or with our locking databases (if there are no DML statement which impacts multiple records) Play the redo-log in the same sequence and remove the entry in database. This can handle split-brain scenario ( we don't use sequences ). Tables without primary keys may also work with this approach .

3)clustering
Use jgroups Relay 2 to cluster datacenters(locks ,state management)

4) Durability
No additional solution needed as 1 and 2 handles durability

5)Full synchronization should ignore redo-log table

Our SLA for database acivation is 6 hours so there won't be transactions would be comparatively less compared to full table scan and processing them through java code.

Questions:
1)Synchronization strategy can be customized through extension points. Does Ha-jdbc maintains a list of deactivated db's? if not , how can this be added into database cluster

2) Is there a extension point where i can hook in redo log code ? if not , What all places should be the redo log code hooked into

This mayn't work for all scenario's but may work for my use case.Please let me know your thoughts.

Thanks in advance

 

Last edit: rajiv reddy 2014-09-11