|
From: Krzysztof B. <kb...@un...> - 2023-02-07 11:28:59
|
Hi Sander, W dniu 7.02.2023 o 09:26, Sander Apweiler pisze: > Dear Krzysztof, > we have problems with slow web UI and creashes of the endpoints. We got > a lot of feedback from users that the web UI is quite slow. Especially > if they want to invite multiple people or just accept an invitation > (more than two minutes the spinning wheel). > When we delete five registrations, the progress bar goes to ~95%, > blinks and it tokes two to three minutes to finish the deletion. If we > want to delete ten registrations, the risk is high that the console > endpoints crashes and we need to restart unity. > Switching the conflict resolution of an attribute statement from skip > to merge toke two minutes this morning. > > I'm pretty sure that our large number of users (14k+) is one of the > reasons for this. It seems that the server itself is not on load. It > has 0,3 having 4 cores. Unity is allowed to use 8GB RAM but the whole > server uses at the moment just 5,4GB. > > We increased already the number of workers to 32. Do you have some > hints how we can get a better performance? It is hard to say and most likely profiling will be needed to identify root cause. Before that can you be more specific by what do you mean be "endpoint crashes"? Are there any exceptions in logs? This might be very helpful. Generally there are many aspects influencing app performance. It is not only memory and CPU/threads. Also it might be related to I/O (e.g. excessive logging on DEBUG/TRACE level or RDBMS access - e.g. too few connections). There might be spikes in memory load which you won't observe wit OS tools, rather you need APM for that. What I'd anyway suggest for any bigger production instance. Then you will be able to check the detailed memory usage stats over time (if JVM runs close to its memory limits, GC kicks in and app starts to be very slow), threads utilization (there are few thread pools). In general my take on performance is that I first try to find reproducible case which is slow, then run it in some isolation (simulate on separate server or even on prod in off peak hours) with some extra logging turned on, find which operations are slow (gap in logs or long reported operation) and proceed from that point. HTH, Krzysztof |