This may be an issue with my configuration (and understanding of SymmetricDS).
I am experiencing the following while performing a reload on a "somewhat big" database (~40GB in ~100 tables):
- the master node prepares a batch, moving from QY, to SE and then LD;
- the slave node receives and starts loading the data;
- the master does "nothing";
- when the loading has finished, the master goes from LD to OK and moves to the next enqueued reload batch.
Since the tables are huge, I get poor throughput: the slavewon't do anything untile the master has completed the QY (slow) and SE (fast) steps. If the master would at least go into the QY status, preparing the file (but even not sending it), the throughput would be way better.
Is this intentional in the default configuration (I use a simplified corp/store sql config w/ a single trigger for '*' tables)?
I think you are describing how batches load using serial processing, which becomes apparent with large bathces like an initial load. The server must query and write the batch, send it to the client, and it waits while the client loads it. Then the process repeats for the next batch.
There is a job that can be enabled to extract the initial load in the background, which can at least prepare batches ahead of time. See here:
But it could still be sending the next batch to the client while one batch is loading. That issue is logged here:
We're also talking about how to implement a thread per channel to allow channels to load in parallel. We do really well with lots of clients sending to a single server, but lately everyone seems to be trying to load lots of data between a small number of servers, so we're working hard to improve that use case.
I initially understood that the initial.load.use.extract.job.enabled=true flag would just split the process in multiple batches, but not parallelize the flow query/send/load. This is why I didn't enable it.
I will give it a try the next database I sync.
IMHO a thread per channel wouldn't change anything in this scenario, since all the batches belong to the same channel ('reload'). Am I wrong?
Good point, the initial load is on the reload channel. But we'll try to include parallelization of initial load also. Maybe we can allow for multiple reload channels, similar to how there are multiple config channels now. I'm hoping that 3.7 brings some more performance for two-node setups.
Log in to post a comment.