what kind of throughput are you now getting? 87 requests/s (versus 32-35 of the single-process setup), although that doesn't tell the whole story. See attached screenshots. Note that this is a case of extreme load (1000 concurrent cache misses over protracted time) which we will probably see very rarely if ever in real world, but the purpose of this was testing the system under duress. You can see that some requests still take a long time, I'd like to figure out why, but our plan is to use an AWS...
OK. I think I finally sorted it out: I wasn't starting multiple iipsrv processes. I thought only one executable would be enough to spawn subprocesses, but that is apparently not the case. After starting 6-8 separate master processes and load balancing them with Nginx, resource utilization went up and with it, so did the requests per second. Sorry for the long monologue, but this was an interesting learning experience.
I must contradict my previous statement about Docker. Docker is not involved. I ran the application again natively (both Nginx and iipsrv on the same machine) and I can still see the bottleneck. There are a ew things that I noticed: No matter how many Nginx workers I set up, 1 or 8 or 16, I always see 8 iipsrv processes: 1 parent and 7 children. I assume this number is tied to my processor cores, which is 8, correct? The master process is always the busiest (more than the sum of the other 7 combined)....
(now I'm really being a spammer) To read the charts, each client (there are 1000 concurrent ones) picks a random image from a bucket of images of either less than 10Mpx, 10-75Mpx, or over 75Mpx and requests 50 thumbnails (128px), 5 large derivatives (3000px) and 1 full-size image. The requests are separated in the charts by these parameters, visible on the first column.
Attaching second image. For some reason, attaching 2 images made me look like a spammer.
I made the changes in the Docker containers and the failures have gone. Setting worker_rlimit_nofile in nginx.conf removed the "Too many files" issue and raising net.core.somaxconn got rid of the gateway timeouts. That's for the good news. Thanks! Bad news is that, at 1,000 clients, the overwhelming majority of the response time is waiting on sockets; to the point that a request for a 128px thumbnail and one for a 5000px image both take 25 seconds. See attached images. At this point, the topic is...
I need to confirm this, it seems like the problem was wtith the open files limit. ulimit -n was not set properly for Nginx, which is in Docker too, but the Nginx log did not complain at all about this. I guess it's the way resource are shared in Docker. As I installed Nginx and iipsrv on the AMI, Nginx started flooding the logs with "too many open files" messages. After adjusting the value, I have been able to sustain the full load for over 10 minutes without a single failure. I will restart the...
At this point I'm tempted to remove the Docker factor and compile iipsrv directly in the AMI.