Autoscaling Thread Pools
Most systems have fixed-size thread pools, apache httpd, php-fpm, gunicorn, gocraft, asynq, and machinery. And those systems work great and help quickly scale without solving the problem from the ground up. Still, with enough scenarios, you would start noticing minor issues like using only about 25% of the CPU, and still, your queue lag is increasing, or your httpd server is giving 503 even though it has enough resources. Or you tried to give 2x CPU and 2x memory expecting your containers will start processing 2x but were surprised to see that it is not.
The issue at the core is that these systems have a virtual limitation, i.e. thread pool size configured. This virtual limitation, if not configured right or not appropriately maintained frequently, it could become the main bottleneck for your processing rate, even if you have enough physical resources.
Monitoring & Alerting
One can workaround this problem by simply fine-tuning the thread pool size to match your CPU/Memory. But this needs to be done frequently. Monitoring & Alerting thread pool utilisation is another excellent way to reactively optimise this system, i.e. if the number of threads in your pool being in use is greater than 80%, start tweaking the pool size and allocate more threads.
But this still won’t help you in issues where one of your upstream service latency increased and changed your threadpool utilisation footprint. Circuit breakers could protect your service against such a scenario, but this is such a common requirement that fine-tuning each time is a hassle.
Very High Thread Pool Size
Why not set the thread pool size soo high that minor or decent size changes in the system don’t require you to fine-tune the thread pool size frequently?
This solution is primarily suitable for thread pools with low memory footprints and short-term latency spike scenarios. For systems like gunicorn, ruby on rails setting a very large thread pool size means allocating a lot of memory. It might be even riskier if your thread pool has a concept of min & max rather than a fixed size. You might think that setting a very high max solves the problem. But, you might end up in an incident where memory utilisation spiked and crashed the containers because the container is not tested against that high thread pool size.
Automated Scaling of Thread Pool
But wait, if we automatically scale the thread pool, won’t we face the same problem of setting the thread pool size to very high? i.e. crashes.
Yes, that would be the case if a proper max limit is not automatically determined. Returning to the first principles, the actual physical resources are “compute”, “memory”, and “network”. These resources dictate the maximum limits of a system beyond which they might break or reduce efficiency if nearing that limit.
Having that knowledge, we can determine a max limit for threadpool when we reach 70% or 80% of any physical resources, i.e. compute, memory & network.
Implementing Automated Thread/Goroutine Pool Scaling
One way of implementing this is by keeping an independent thread pool scale-up system and another system that sets the maximum limit. However, competing feedback loops are highly unpredictable if you don’t fine-tune the system properly.
Similar to target tracking, we need to to design a single metric and feedback loop that scales the system up or down. This single metric must be composed of thread utilisation, cpu utilisation, and memory utilisation (often the network is much higher and so not considered here, but for applications where this becomes critical, the metric should consider network utilisation as well).
The desired number of goroutines can be evaluated every second. With a moving sliding window average of 30 seconds, a cool-down period and ignoring minor changes in the system can further smooth out the scaling pattern.
It is often harder to get the CPU utilisation of just the worker thread pool, excluding all other components running in the system, and the same goes for memory utilisation. So for smaller units of goroutines, it is better to scale just alone on the goroutine utilisation itself until the CPU and memory grow enough that main worker goroutines or threads become a significant contributor to the CPU and Memory utilisation. This can easily be done based on an absolute decent unit of CPU & Memory, i.e. 1vcpu & 1GB etc.,…
Conclusion
While monitoring and alerting are a decent enough solution for figuring out the right thread pool size for your system, moving towards zero-config by autoscaling thread pools is a much better elastic system that automatically adapts to changes in the system.