What is recommended number of threads for pytorch based on available CPU cores?

First I want to say that I don't have much experience with pytorch, ML, NLP and other related topics, so I may confuse some concepts. Sorry.

I downloaded few models from Hugging Face, organized them in one Python script and started to perform benchmark to get overview of performance. During benchmark I monitored CPU usage and saw that only 50% of CPU was used. I have 8 vCPU, but only 4 of them are loaded at 100% at the same time. The load is jumping, i.e. there may be cores 1, 3, 5, 7 that are loaded at 100%, then cores 2, 4, 6, 8 that are loaded at 100%. But in total CPU load never raises above 50%, it also never goes below 50%. This 50% load is constant.

After quick googling I found parallelism doc. I called get_num_threads() and get_num_interop_threads() and output was 4 for both calls. Only 50% of available CPU cores which kind of explains why CPU load was at 50%.

Then I called set_num_threads(8) and set_num_interop_threads(8), and then performed benchmark. CPU usage was at constant 100%. In general performance was a bit faster, but some models started to work a bit slowly than at 50% of CPU.

So I wonder why pytorch by default uses only half of CPU? It is optimal and recommended way? Should I manually call set_num_threads() and set_num_interop_threads() with all available CPU cores if I want to achieve best performance?

Edit.

I made an additional benchmarks:

one pytorch process with 50% of vCPU is a bit faster than one pytorch process with 100% of vCPU. Earlier it was vice versa, so I think it depends on models that are being used.
two pytorch concurrent processes with 50% of vCPU will handle more inputs than one pytorch process with 50% of vCPU, but it is not 2x increase, it is ~1.2x increase. Process time of one input is much slower than with one pytorch process.
two pytroch concurrent processes with 100% of vCPU can't complete even one input. I guess CPU is constantly switching between these processes.

So thank you to Phoenix's answer, I think it is completely reasonable to use pytorch default settings which sets number of threads according to number of physical (not virtual) cores.

Edit.

pytorch documentation about this - https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html

PyTorch typically uses the number of physical CPU cores as the default number of threads. This means: torch.get_num_threads() and torch.get_num_interop_threads() typically return the number of physical CPU cores.

Use the default behavior unless you have a specific reason to change it .
When changing the number of threads, use torch.set_num_threads() and torch.set_num_interop_threads().
Avoid oversubscription by not using more threads than available CPU cores.

For example:

import torch

# Get current number of threads
num_threads = torch.get_num_threads()
print(f"Current number of threads: {num_threads}")

# Set custom number of threads (e.g., equal to physical cores)
torch.set_num_threads(num_threads)
torch.set_num_interop_threads(num_threads)

# Check new settings
print(f"New number of threads: {torch.get_num_threads()}")
print(f"New number of inter-op threads: {torch.get_num_interop_threads()}")

Recommended topics

Hot tags