How to Detect the Number of Physical Processors / Cores on Windows, Mac and Linux
Asked Answered
U

14

54

I have a multi threaded c++ application that runs on Windows, Mac and a few Linux flavors.

To make a long story short: In order for it to run at maximum efficiency, I have to be able to instantiate a single thread per physical processor/core. Creating more threads than there are physical processors/cores degrades the performance of my program considerably. I can already correctly detect the number of logical processors/cores correctly on all three of these platforms. To be able to detect the number of physical processors/cores correctly I'll have to detect if hyper-treading is supported AND active.

My question therefore is if there is a way to detect whether Hyper Threading is supported and enabled? If so, how exactly.

Ultrasound answered 25/5, 2010 at 2:57 Comment(5)
Check this out - stackoverflow.com/questions/150355/…Vesting
Didn't you just ask the same question a couple days ago? stackoverflow.com/questions/2904283/…Marler
Have you abandoned this question?Jollification
Did you ever get this resolved successfully? Do you still need help with this?Mundell
Currently working on a Cross-Platform solution to access hardware/system information: github.com/lfreist/hwinfoDarned
J
31

EDIT: This is no longer 100% correct due to Intel's ongoing befuddlement.

The way I understand the question is that you are asking how to detect the number of CPU cores vs. CPU threads which is different from detecting the number of logical and physical cores in a system. CPU cores are often not considered physical cores by the OS unless they have their own package or die. So an OS will report that a Core 2 Duo, for example, has 1 physical and 2 logical CPUs and an Intel P4 with hyper-threads will be reported exactly the same way even though 2 hyper-threads vs. 2 CPU cores is a very different thing performance wise.

I struggled with this until I pieced together the solution below, which I believe works for both AMD and Intel processors. As far as I know, and I could be wrong, AMD does not yet have CPU threads but they have provided a way to detect them that I assume will work on future AMD processors which may have CPU threads.

In short here are the steps using the CPUID instruction:

  1. Detect CPU vendor using CPUID function 0
  2. Check for HTT bit 28 in CPU features EDX from CPUID function 1
  3. Get the logical core count from EBX[23:16] from CPUID function 1
  4. Get actual non-threaded CPU core count
    1. If vendor == 'GenuineIntel' this is 1 plus EAX[31:26] from CPUID function 4
    2. If vendor == 'AuthenticAMD' this is 1 plus ECX[7:0] from CPUID function 0x80000008

Sounds difficult but here is a, hopefully, platform independent C++ program that does the trick:

#include <iostream>
#include <string>

using namespace std;


void cpuID(unsigned i, unsigned regs[4]) {
#ifdef _WIN32
  __cpuid((int *)regs, (int)i);

#else
  asm volatile
    ("cpuid" : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
     : "a" (i), "c" (0));
  // ECX is set to zero for CPUID function 4
#endif
}


int main(int argc, char *argv[]) {
  unsigned regs[4];

  // Get vendor
  char vendor[12];
  cpuID(0, regs);
  ((unsigned *)vendor)[0] = regs[1]; // EBX
  ((unsigned *)vendor)[1] = regs[3]; // EDX
  ((unsigned *)vendor)[2] = regs[2]; // ECX
  string cpuVendor = string(vendor, 12);

  // Get CPU features
  cpuID(1, regs);
  unsigned cpuFeatures = regs[3]; // EDX

  // Logical core count per CPU
  cpuID(1, regs);
  unsigned logical = (regs[1] >> 16) & 0xff; // EBX[23:16]
  cout << " logical cpus: " << logical << endl;
  unsigned cores = logical;

  if (cpuVendor == "GenuineIntel") {
    // Get DCP cache info
    cpuID(4, regs);
    cores = ((regs[0] >> 26) & 0x3f) + 1; // EAX[31:26] + 1

  } else if (cpuVendor == "AuthenticAMD") {
    // Get NC: Number of CPU cores - 1
    cpuID(0x80000008, regs);
    cores = ((unsigned)(regs[2] & 0xff)) + 1; // ECX[7:0] + 1
  }

  cout << "    cpu cores: " << cores << endl;

  // Detect hyper-threads  
  bool hyperThreads = cpuFeatures & (1 << 28) && cores < logical;

  cout << "hyper-threads: " << (hyperThreads ? "true" : "false") << endl;

  return 0;
}

I haven't actually tested this on Windows or OSX yet but it should work as the CPUID instruction is valid on i686 machines. Obviously, this wont work for PowerPC but then they don't have hyper-threads either.

Here is the output on a few different Intel machines:

Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz:

 logical cpus: 2
    cpu cores: 2
hyper-threads: false

Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz:

 logical cpus: 4
    cpu cores: 4
hyper-threads: false

Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (w/ x2 physical CPU packages):

 logical cpus: 16
    cpu cores: 8
hyper-threads: true

Intel(R) Pentium(R) 4 CPU 3.00GHz:

 logical cpus: 2
    cpu cores: 1
hyper-threads: true
Jollification answered 21/6, 2010 at 6:36 Comment(15)
Would you care to elaborate?Jollification
Thanks for the code jcoffland - however, I don't think the code in your example main() will work with AMD processors as it stands and the __cpuid() has changed to __cpuidex() for Windows x64. The main code problem is that the test for vendor == AMD is within an if block testing for vendor == Intel. The AMD part will never get executed. It'd be great if you could fix the code :) Thanks and RegardsMachinegun
According to this article: msdn.microsoft.com/en-us/library/hskdteyh.aspx, __cpuid() is still valid on Windows x64.Jollification
Note that it's possible for the OS to disable some of the CPU's cores for various reasons (power savings, licensing issues...), so this isn't totally reliableEskilstuna
I agree with Eric, the code is not correct. The logical cpus or threads of the Xeon E5520 is 8 not 16 as your code detects. ark.intel.com/products/40200/…. I also tried running it on my own system, i7 2600K it also said 16 logical cpus and 8 cores. Though this system has 4cores and 8threads.Haggard
@Alex, The Xeon E5520 in the example above was from a machine with two processor packages so the numbers are correct. However, the i7 does cause problems with this code. There weren't many i7s when I posted that solution. It seemed to worked at the time. With the introduction of the i7 Intel made their own CPID instruction almost impossible to use for this purpose.Jollification
Well if this is the cause why don't you write so in your post. That the Xeon machine has two CPU packages. Seemse like a big factor when it comes to counting number of logical and physical cpucores!Haggard
FWIW, I have a X5680 CPU which has six cores (which is what is reported in device manager), but this code sees 16 cores and 32 logical CPUs.Nihi
The CPUID function will work on a single processor (whichever one it happens to run on). From what I've been reading, there's no standard way to detect the actual number of CPU sockets so that you can run the CPUID on a thread with specific processor affinity in order to get the information for each CPU and then sum them up.Veliz
Also, HyperThreading detection via the CPUID command is also a bit touchy. For example, my processor supports HyperThreading, however my BIOS does not so I am not able to utilize the HyperThreading capability even though the CPUID says that I should be able to. I'm actually writing some code to detect logical cores on Windows and *nix and I was amazed to find this out. So, the OPs question wasn't really answered. As far as HyperThreading, you'll have to determine if the number of logical cores is 2x the number of physical cores. This is the only way you could be sure that HT is enabled.Veliz
I agree to @Githlar's comment, as I see the code in this answer still gives "HT enabled" although I have DISABLED hyperthreading in BIOS for some experiment purposes.Satsuma
The code is still not working. It reported 16 logical cores on a i5-3317U cpu.Averell
@Achimnol, the codes does not check if HT is enabled but whether it's a feature of the processor. Also, the OS and BIOS must support HT as well. Checking if it's enabled requires querying each processor individually.Jollification
@Zboson, Intel has made a horrible mess of the CPUID instruction. It's a constantly moving target.Jollification
The code did work at one point, but it's incorrect even for a 2010-made i3 5x0 processor because around that time Intel decide to add gaps in the APIC id space. See my answer in another thread for details.Absolute
O
26

Note this, does not give the number of physically cores as intended, but logical cores.

If you can use C++11 (thanks to alfC's comment beneath):

#include <iostream>
#include <thread>

int main() {
    std::cout << std::thread::hardware_concurrency() << std::endl;
    return 0;
}

Otherwise maybe the Boost library is an option for you. Same code but different include as above. Include <boost/thread.hpp> instead of <thread>.

Opsonin answered 27/5, 2010 at 13:44 Comment(3)
This is a very simple solution but it does not differentiate hardware threads, a.k.a. hyper-threads, from physical CPUs or cores which I think is the point of this question.Jollification
Yes you are right I missed this detail, so should I delete my post?Opsonin
Don't delete your post, this information is very helpful. Thank you it helped me!Vitkun
S
18

Windows only solution desribed here:

GetLogicalProcessorInformation

for linux, /proc/cpuinfo file. I am not running linux now so can't give you more detail. You can count physical/logical processor instances. If logical count is twice as physical, then you have HT enabled (true only for x86).

Serpentiform answered 26/5, 2010 at 0:48 Comment(1)
I've upvoted this because it works. For the textual record (and in case the link goes dead at some point), the msdn link in this answer uses GetLogicalProcessorInformation, which works ok on most recent versions of Windows. (Source says: "Windows Server 2003, Windows XP Professional x64 Edition, and Windows XP with SP3: This example reports the number of physical processors rather than the number of active processor cores.") This msdn link should not be confused with one for __cpuid, which unfortunately has non-woking example (on most post-2010 Intel CPUs.)Absolute
A
15

The current highest voted answer using CPUID appears to be obsolete. It reports both the wrong number of logical and physical processors. This appears to be confirmed from this answer cpuid-on-intel-i7-processors.

Specifically, using CPUID.1.EBX[23:16] to get the logical processors or CPUID.4.EAX[31:26]+1 to get the physical ones with Intel processors does not give the correct result on any Intel processor I have.

For Intel CPUID.Bh should be used Intel_thread/Fcore and cache topology. The solution does not appear to be trivial. For AMD a different solution is necessary.

Here is source code by by Intel which reports the correct number of physical and logical cores as well as the correct number of sockets https://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/. I tested this on a 80 logical core, 40 physical core, 4 socket Intel system.

Here is source code for AMD http://developer.amd.com/resources/documentation-articles/articles-whitepapers/processor-and-core-enumeration-using-cpuid/. It gave the correct result on my single socket Intel system but not on my four socket system. I don't have a AMD system to test.

I have not dissected the source code yet to find a simple answer (if one exists) with CPUID. It seems that if the solution can change (as it seems to have) that the best solution is to use a library or OS call.

Edit:

Here is a solution for Intel processors with CPUID leaf 11 (Bh). The way to do this is loop over the logical processors and get the x2APIC ID for each logical processor from CPUID and count the number of x2APIC IDs were the least significant bit is zero. For systems without hyper-threading the x2APIC ID will always be even. For systems with hyper-threading each x2APIC ID will have an even and odd version.

// input:  eax = functionnumber, ecx = 0
// output: eax = output[0], ebx = output[1], ecx = output[2], edx = output[3]
//static inline void cpuid (int output[4], int functionnumber)  

int getNumCores(void) {
    //Assuming an Intel processor with CPUID leaf 11
    int cores = 0;
    #pragma omp parallel reduction(+:cores)
    {
        int regs[4];
        cpuid(regs,11);
        if(!(regs[3]&1)) cores++; 
    }
    return cores;
}

The threads must be bound for this to work. OpenMP by default does not bind threads. Setting export OMP_PROC_BIND=true will bind them or they can be bound in code as shown at thread-affinity-with-windows-msvc-and-openmp.

I tested this on my 4 core/8 HT system and it returned 4 with and without hyper-threading disabled in the BIOS. I also tested in on a 4 socket system with each socket having 10 cores / 20 HT and it returned 40 cores.

AMD processors or older Intel processors without CPUID leaf 11 have to do something different.

Abuse answered 18/7, 2014 at 12:2 Comment(4)
Well, I upvoted you because this is better than the other answers, but you are not technically answering the question. He only wants to know if HT is enabled. Counting the active, non-HT cores doesn't do that. You need to change your code to return a yes/no answer, i.e. if(regs[3]&1) ht_cores++ and after the reduction ht_enabled = (ht_cores > 0).Absolute
Also, the AMD code doesn't work properly on the current Intel processors basically for the same reason that old Intel code enumeration doesn't work properly: it assumes no gaps in the APIC id space. But that assumption is incorrect for Intel processors starting in late 2009 or so.Absolute
Thanks for your answer, this was very useful. I have a related question: if you disable HT in the BIOS, what is the level type (e.g. bits 8:15 of the output value in ECX) reported by CPUID leaf 11 with ECX = 0? Is it "SMT" or "core"?Pisces
@user3588161, the OP asked to find the number of physical cores OR if hyper-threading was enabled. I answered the first part for Intel processors with leaf 11. I think that's what the OP was really after.Abuse
H
10

Due to the seemingly needless addition of even more threads per core, and these thread being unevenly distributed on a per core basis, this is no longer valid.

From gathering ideas and concepts from some of the above ideas, I have come up with this solution. Please critique.

//EDIT INCLUDES

#ifdef _WIN32
    #include <windows.h>
#elif MACOS
    #include <sys/param.h>
    #include <sys/sysctl.h>
#else
    #include <unistd.h>
#endif

For almost every OS, the standard "Get core count" feature returns the logical core count. But in order to get the physical core count, we must first detect if the CPU has hyper threading or not.

uint32_t registers[4];
unsigned logicalcpucount;
unsigned physicalcpucount;
#ifdef _WIN32
SYSTEM_INFO systeminfo;
GetSystemInfo( &systeminfo );

logicalcpucount = systeminfo.dwNumberOfProcessors;

#else
logicalcpucount = sysconf( _SC_NPROCESSORS_ONLN );
#endif

We now have the logical core count, now in order to get the intended results, we first must check if hyper threading is being used or if it's even available.

__asm__ __volatile__ ("cpuid " :
                      "=a" (registers[0]),
                      "=b" (registers[1]),
                      "=c" (registers[2]),
                      "=d" (registers[3])
                      : "a" (1), "c" (0));

unsigned CPUFeatureSet = registers[3];
bool hyperthreading = CPUFeatureSet & (1 << 28);

Because there is not an Intel CPU with hyper threading that will only hyper thread one core (at least not from what I have read). This allows us to find this is a really painless way. If hyper threading is available,the logical processors will be exactly double the physical processors. Otherwise, the operating system will detect a logical processor for every single core. Meaning the logical and the physical core count will be identical.

if (hyperthreading){
    physicalcpucount = logicalcpucount / 2;
} else {
    physicalcpucount = logicalcpucount;
}

fprintf (stdout, "LOGICAL: %i\n", logicalcpucount);
fprintf (stdout, "PHYSICAL: %i\n", physicalcpucount);
Horse answered 21/4, 2015 at 1:22 Comment(5)
I can do that, thanks for the tip, just some comments or a little more?Horse
I made some changes, is the explanation any better?Horse
Yeah, that's what I was thinking of.Howitzer
Now with intel's performance cores and hyperthreading cores we can no longer divide the total by / 2?Sugarplum
@Sugarplum tragically, you are correct. Also AMD seems to be doing the same thing now, so its anyone's guess.Horse
L
8

To follow on from math's answer, as of boost 1.56 there exists the physical_concurrency attribute which does exactly what you want.

From the documentation - http://www.boost.org/doc/libs/1_56_0/doc/html/thread/thread_management.html#thread.thread_management.thread.physical_concurrency

The number of physical cores available on the current system. In contrast to hardware_concurrency() it does not return the number of virtual cores, but it counts only physical cores.

So an example would be

    #include <iostream>
    #include <boost/thread.hpp>

    int main()
    {
        std::cout << boost::thread::physical_concurrency();
        return 0;
    }
Langill answered 25/6, 2015 at 17:43 Comment(0)
T
6

I know this is an old thread, but no one mentioned hwloc. The hwloc library is available on most Linux distributions and can also be compiled on Windows. The following code will return the number of physical processors. 4 in the case of a i7 CPU.

#include <hwloc.h>

int nPhysicalProcessorCount = 0;

hwloc_topology_t sTopology;

if (hwloc_topology_init(&sTopology) == 0 &&
    hwloc_topology_load(sTopology) == 0)
{
    nPhysicalProcessorCount =
        hwloc_get_nbobjs_by_type(sTopology, HWLOC_OBJ_CORE);

    hwloc_topology_destroy(sTopology);
}

if (nPhysicalProcessorCount < 1)
{
#ifdef _OPENMP
    nPhysicalProcessorCount = omp_get_num_procs();
#else
    nPhysicalProcessorCount = 1;
#endif
}
Thierry answered 2/4, 2015 at 14:6 Comment(2)
Please, describe why your code will solve OP's question.Brusa
I added more information. This is exactly what the OP was looking for. All other suggestions aren't multi-platform or are only working with some specific hardware.Thierry
N
5

It is not sufficient to test if an Intel CPU has hyperthreading, you also need to test if hyperthreading is enabled or disabled. There is no documented way to check this. An Intel guy came up with this trick to check if hyperthreading is enabled: Check the number of programmable performance counters using CPUID[0xa].eax[15:8] and assume that if the value is 8, HT is disabled, and if the value is 4, HT is enabled (https://software.intel.com/en-us/forums/intel-isa-extensions/topic/831551).

There is no problem on AMD chips: The CPUID reports 1 or 2 threads per core depending on whether simultaneous multithreading is disabled or enabled.

You also have to compare the thread count from the CPUID with the thread count reported by the operating system to see if there are multiple CPU chips.

I have made a function that implements all of this. It reports both the number of physical processors and the number of logical processors. I have tested it on Intel and AMD processors in Windows and Linux. It should work on Mac as well. I have published this code at https://github.com/vectorclass/add-on/tree/master/physical_processors

Nylon answered 1/11, 2019 at 6:40 Comment(0)
A
2

On OS X, you can read these values from sysctl(3) (the C API, or the command line utility of the same name). The man page should give you usage information. The following keys may be of interest:

$ sysctl hw
hw.ncpu: 24
hw.activecpu: 24
hw.physicalcpu: 12  <-- number of cores
hw.physicalcpu_max: 12
hw.logicalcpu: 24   <-- number of cores including hyper-threaded cores
hw.logicalcpu_max: 24
hw.packages: 2      <-- number of CPU packages
hw.ncpu = 24
hw.availcpu = 24
Alika answered 13/8, 2014 at 20:32 Comment(0)
C
1

On Windows, there are GetLogicalProcessorInformation and GetLogicalProcessorInformationEx available for Windows XP SP3 or older and Windows 7+ respectively. The difference is that GetLogicalProcessorInformation doesn't support setups with more than 64 logical cores, which might be important for server setups, but you can always fall back to GetLogicalProcessorInformation if you're on XP. Example usage for GetLogicalProcessorInformationEx (source):

PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX buffer = NULL;
PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX ptr = NULL;
BOOL rc;
DWORD length = 0;
DWORD offset = 0;
DWORD ncpus = 0;
DWORD prev_processor_info_size = 0;
for (;;) {
    rc = psutil_GetLogicalProcessorInformationEx(
            RelationAll, buffer, &length);
    if (rc == FALSE) {
        if (GetLastError() == ERROR_INSUFFICIENT_BUFFER) {
            if (buffer) {
                free(buffer);
            }
            buffer = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX)malloc(length);
            if (NULL == buffer) {
                return NULL;
            }
        }
        else {
            goto return_none;
        }
    }
    else {
        break;
    }
}
ptr = buffer;
while (offset < length) {
    // Advance ptr by the size of the previous
    // SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX struct.
    ptr = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX*)\
        (((char*)ptr) + prev_processor_info_size);

    if (ptr->Relationship == RelationProcessorCore) {
        ncpus += 1;
    }

    // When offset == length, we've reached the last processor
    // info struct in the buffer.
    offset += ptr->Size;
    prev_processor_info_size = ptr->Size;
}

free(buffer);
if (ncpus != 0) {
    return ncpus;
}
else {
    return NULL;
}

return_none:
if (buffer != NULL)
    free(buffer);
return NULL;

On Linux, parsing /proc/cpuinfo might help.

Christianly answered 25/7, 2019 at 11:15 Comment(0)
M
0

I don't know that all three expose the information in the same way, but if you can safely assume that the NT kernel will report device information according to the POSIX standard (which NT supposedly has support for), then you could work off that standard.

However, differing of device management is often cited as one of the stumbling blocks to cross platform development. I would at best implement this as three strands of logic, I wouldn't try to write one piece of code to handle all platforms evenly.

Ok, all that's assuming C++. For ASM, I presume you'll only be running on x86 or amd64 CPUs? You'll still need two branch paths, one for each architecture, and you'll need to test Intel separate from AMD (IIRC) but by and large you just check for the CPUID. Is that what you're trying to find? The CPUID from ASM on Intel/AMD family CPUs?

Mundell answered 25/5, 2010 at 3:7 Comment(0)
R
0

OpenMP should do the trick:

// test.cpp
#include <omp.h>
#include <iostream>

using namespace std;

int main(int argc, char** argv) {
  int nThreads = omp_get_max_threads();
  cout << "Can run as many as: " << nThreads << " threads." << endl;
}

most compilers support OpenMP. If you are using a gcc-based compiler (*nix, MacOS), you need to compile using:

$ g++ -fopenmp -o test.o test.cpp

(you might also need to tell your compiler to use the stdc++ library):

$ g++ -fopenmp -o test.o -lstdc++ test.cpp

As far as I know OpenMP was designed to solve this kind of problems.

Roller answered 27/9, 2012 at 22:5 Comment(2)
It only gives the number of logical cores.Scrutinize
This answer is not worse than the boost example.Abuse
D
0

This is very easy to do in Python:

$ python -c "import psutil; psutil.cpu_count(logical=False)"
4

Maybe you could look at the psutil source code to see what is going on?

Desertion answered 3/6, 2016 at 16:26 Comment(3)
he is asking c++.Exterminate
@Exterminate "Maybe you could look at the psutil source code to see what's going on?"Christianly
@Exterminate It seems that it uses GetLogicalProcessorInformationEx on Windows and sysconf("SC_NPROCESSORS_ONLN") on Linux and if it fails, it parses /proc/cpuinfo and finds all lines that start with "processor".Christianly
A
0

You may use the library libcpuid (Also on GitHub - libcpuid).

As can be seen in its documentation page:

#include <stdio.h>
#include <libcpuid.h>

int main(void)
{
    if (!cpuid_present()) {                                                // check for CPUID presence
        printf("Sorry, your CPU doesn't support CPUID!\n");
        return -1;
    }

if (cpuid_get_raw_data(&raw) < 0) {                                    // obtain the raw CPUID data
        printf("Sorry, cannot get the CPUID raw data.\n");
        printf("Error: %s\n", cpuid_error());                          // cpuid_error() gives the last error description
        return -2;
}

if (cpu_identify(&raw, &data) < 0) {                                   // identify the CPU, using the given raw data.
        printf("Sorrry, CPU identification failed.\n");
        printf("Error: %s\n", cpuid_error());
        return -3;
}

printf("Found: %s CPU\n", data.vendor_str);                            // print out the vendor string (e.g. `GenuineIntel')
    printf("Processor model is `%s'\n", data.cpu_codename);                // print out the CPU code name (e.g. `Pentium 4 (Northwood)')
    printf("The full brand string is `%s'\n", data.brand_str);             // print out the CPU brand string
    printf("The processor has %dK L1 cache and %dK L2 cache\n",
        data.l1_data_cache, data.l2_cache);                            // print out cache size information
    printf("The processor has %d cores and %d logical processors\n",
        data.num_cores, data.num_logical_cpus);                        // print out CPU cores information

}

As can be seen, data.num_cores, holds the number of Physical cores of the CPU.

Anapest answered 7/10, 2019 at 8:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.