Luke Koops - 2013-03-08

I have a datawarehouse with many homogeneous (for now) databases replicating into it. I am having trouble supporting a disaster/recovery scenario. If I have to restore one of the source nodes from backup (or snapshot), the SYM tables also get restored.

To get back in sync, I delete all of the data from the warehouse that is sourced from the restored server and then run an initial load (reversed reload).

When I run the reload, the data warehouse node ignores the incoming batches for a while. I presume this is because the outgoing batch sequence value was restored from the backup at the source, and is too low. Batches that come in at the data warehouse node look like retries.

Is this hypothesis correct? Is there any way to reset the batch sequence, so that the reload will work. I would prefer to fix this at the data warehouse node. To set the expected value for a batch to 0, or set a flag to start processing the next batch, regardless of the batch number.

If there is no such feature, the next option is to fix this at the source node. Perhaps, set the outgoing_batch sequence number to MAX, which may caouse it to wrap around to 0 in coordination with the warehouse node.

What I'm doing now is wipe/reload, wipe/reload until eventually it starts working. Then one more wipe/reload because the last reload only got a subset of the batches.