Does anyone have experience scaling to thousands of engines?
For sake of argument, let's say we have 1,000 computers in the field, each using bidirectional sync to their mirrored database in the cloud, all performing sync through one symmetric server (this server has 1,000 engines defined and running).
The problems we run into are
1. the symmetric server instance is very heavyweight on RAM/CPU. If we double our number of engines (which we will as we move forward), we don't have an easy resize path in AWS to even support this load.
2. since symmetric opens so many connections to our database servers, it really hurts on the database server side. Each connection consumes resources on the server. We are able to support about 500 databases on each server at the moment.
Clustering symmetric I don't think helps, won't each symmetric cluster node open N connections to the database? If so, that's crippling. Can I put pgbouncer or something in between symmetric and postgres, or will that cause issues?
I'm thinking I will need to set up a completely separate symmetric server and point new engines there.
Anyone tackle something similar? Any other thoughts on how I can scale this to 2,000 engines?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Bob, Symmetric was actually created for this kind of use case, but the typical use case is for a small Symmetric engine to be run on each of the 1000 computers. We would then expect just one engine running on your server which would service all of your 1000 clients.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Right, but that's not how our system works. We are syncrhonizing a thousand+ distinct machines with their own inventory management software (each one is independent).
So given my constraint, how would one scale this?
I think my only option is to create multiple symmetric servers and route the engines to them using URL matching rules in my load balancer. Having 1,000+ nodes on one server is very heavyweight, it can take an hour to start up. I'd hate to put 2,000 on a single server as the problems and slowness will only be amplified.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Bob, maybe I'm not fully understanding your use case. Here's how Symmetric is typcailly used: There is one engines.properities file per database. There is often also one SymmetricDS software install per database. Each SymmetricDS install should have a LAN connection to the local database.
* In your case, I would guess would have 1000+ local Symmetric installs along side of your inventory management softare installation, all registratering to a central server.
Given that, can you describe your use case a little more and where it devites? Are you saying you have 1000 clients and 1000 servers effectively?
Thanks,
Mark
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Mark,
We make inventory management vending machines. Each machine has a local database that we syncrhonize to its own database in the cloud. On each machine, it has a postgres database and an instance of symmetricds acting as a client.
We have one symmetric server running with 1000 nodes. All machine symmetric instances connect to this server.
Since we literally have 1000+ databases (one per vending machien), we have 1000 engine files. Symmetric is very heavyweight with 1000 engines running, connecting to 1000 databases.
I need to plan for adding another 1000 per year, but I can't think of any way to reasonably do this besides setting up a second symmetric server and pointing new machines to this new server.
Does this make sense? Also, I wish this platform would actually email me when there's a reply. I need to see why the emails are getting lost.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a similar requirement. There are thousands of offices running our software with mysql database (each having their own data). i need to backup these databases to cloud if possible near real time. I set up a master with a single engine - 1 config database to hold sym* data. Then in the simulated office symmetricds setup - i updated target_catalog_name in router = office-1001. I have a database in the cloud with that name and structure to hold the office data. I have couple of simualted office setups runnning. This works. When i update database in office 1001, it correctly updates in office-1001 database in the cloud. Same is true for office-1002
Master :
1. registration server
2. 1 config database 'central-config'
3. created office 1001 database and office 1002 database.
office 1001/1002:
1. point to registration url of the master
2. updated target_catalog_name in sym_router to office-1001/1002.
Unknowns / problems
1. Now this is a simulated environment, not sure what will happen if i point 100s of office clients to one master-one engine.
2. Because i point all the clients to one master/engine, all the sym_* configurations of all the office nodes gets synced to all the office nodes. I dont know how to stop this sync.
3. I am not sure this was intended design and may cause additional problems.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Does anyone have experience scaling to thousands of engines?
For sake of argument, let's say we have 1,000 computers in the field, each using bidirectional sync to their mirrored database in the cloud, all performing sync through one symmetric server (this server has 1,000 engines defined and running).
The problems we run into are
1. the symmetric server instance is very heavyweight on RAM/CPU. If we double our number of engines (which we will as we move forward), we don't have an easy resize path in AWS to even support this load.
2. since symmetric opens so many connections to our database servers, it really hurts on the database server side. Each connection consumes resources on the server. We are able to support about 500 databases on each server at the moment.
Clustering symmetric I don't think helps, won't each symmetric cluster node open N connections to the database? If so, that's crippling. Can I put pgbouncer or something in between symmetric and postgres, or will that cause issues?
I'm thinking I will need to set up a completely separate symmetric server and point new engines there.
Anyone tackle something similar? Any other thoughts on how I can scale this to 2,000 engines?
Bob, Symmetric was actually created for this kind of use case, but the typical use case is for a small Symmetric engine to be run on each of the 1000 computers. We would then expect just one engine running on your server which would service all of your 1000 clients.
Right, but that's not how our system works. We are syncrhonizing a thousand+ distinct machines with their own inventory management software (each one is independent).
So given my constraint, how would one scale this?
I think my only option is to create multiple symmetric servers and route the engines to them using URL matching rules in my load balancer. Having 1,000+ nodes on one server is very heavyweight, it can take an hour to start up. I'd hate to put 2,000 on a single server as the problems and slowness will only be amplified.
Hi Bob, maybe I'm not fully understanding your use case. Here's how Symmetric is typcailly used:
There is one engines.properities file per database.
There is often also one SymmetricDS software install per database. Each SymmetricDS install should have a LAN connection to the local database.
* In your case, I would guess would have 1000+ local Symmetric installs along side of your inventory management softare installation, all registratering to a central server.
Given that, can you describe your use case a little more and where it devites? Are you saying you have 1000 clients and 1000 servers effectively?
Thanks,
Mark
Mark,
We make inventory management vending machines. Each machine has a local database that we syncrhonize to its own database in the cloud. On each machine, it has a postgres database and an instance of symmetricds acting as a client.
We have one symmetric server running with 1000 nodes. All machine symmetric instances connect to this server.
Since we literally have 1000+ databases (one per vending machien), we have 1000 engine files. Symmetric is very heavyweight with 1000 engines running, connecting to 1000 databases.
I need to plan for adding another 1000 per year, but I can't think of any way to reasonably do this besides setting up a second symmetric server and pointing new machines to this new server.
Does this make sense? Also, I wish this platform would actually email me when there's a reply. I need to see why the emails are getting lost.
I have a similar requirement. There are thousands of offices running our software with mysql database (each having their own data). i need to backup these databases to cloud if possible near real time. I set up a master with a single engine - 1 config database to hold sym* data. Then in the simulated office symmetricds setup - i updated target_catalog_name in router = office-1001. I have a database in the cloud with that name and structure to hold the office data. I have couple of simualted office setups runnning. This works. When i update database in office 1001, it correctly updates in office-1001 database in the cloud. Same is true for office-1002
Master :
1. registration server
2. 1 config database 'central-config'
3. created office 1001 database and office 1002 database.
office 1001/1002:
1. point to registration url of the master
2. updated target_catalog_name in sym_router to office-1001/1002.
Unknowns / problems
1. Now this is a simulated environment, not sure what will happen if i point 100s of office clients to one master-one engine.
2. Because i point all the clients to one master/engine, all the sym_* configurations of all the office nodes gets synced to all the office nodes. I dont know how to stop this sync.
3. I am not sure this was intended design and may cause additional problems.