Puma has two parameters actually, the number of threads and the number of workers. If we slightly change the default puma.rb
, it will look like that:
# WORKERS_NUM is not a default env variable name
workers Integer(ENV['WORKERS_NUM'] || 1)
max_threads_count = Integer(ENV['RAILS_MAX_THREADS'] || 1)
min_threads_count = max_threads_count
threads min_threads_count, max_threads_count
The number of workers is the number of separate processes that Puma spawns for you. Usually, it is a good idea to set it equal to the number of processor cores you have on your server. You could spawn more of them to allow for more requests to be processed simultaneously, but workers create additional memory overhead – each worker spins up a copy of your rails app, so usually, you would use threads to achieve higher throughput.
RAILS_MAX_THREADS
is a way to set the number of threads each of your workers will use under the hood. In the example above, the min_threads_count
is equal to the max_threads_count
, so the number of threads is constant. If you set them to be different, it is going to scale from the min to the max, but I haven't seen it in the wild.
There are several reasons to limit the number of threads – your interpreter and response times:
- If you use MRI, your threads are limited by GIL, so they're not run in parallel. MRI imitates parallel execution by context switching. A big number of threads will allow for much more simultaneous connections, but the average response time will increase because of the GIL.
- Platform limits: i.e. heroku has thread number limits https://devcenter.heroku.com/articles/dynos#process-thread-limits, linux limits only the number of processes Maximum number of threads per process in Linux?.
- When the code isn't thread-safe, there is a chance that using more than one thread will result in unpredictable problems. That's actually my case, so I didn't experiment much with the number of threads.
There was also an argument that slow IO blocks ruby process and doesn't allow context switching (i.e. calls to external services, or generating large files on the fly), but it turns out not to be true http://yehudakatz.com/2010/08/14/threads-in-ruby-enough-already/. But optimizing your architecture to do as much work in the background, as possible is always a good idea.
This answer will help you to find out a perfect combination of the number of threads vs the number of workers given your hardware.
This shows how the benchmarking could be done to find the exact numbers.
To sum up:
WORKERS_NUM
multiplied by RAILS_MAX_THREADS
gives you a maximum number of simultaneous connections that can be processed by puma. If the number is too low, your users will see timeouts during load spikes. To achieve the best performance given you use MRI, you need to set WORKERS_NUM
to the number of cores and find optimal RAILS_MAX_THREADS
based on average response time during performance tests.