How to programmatically get the CPU cache line size in C++?

P

8

30

I'd like my program to read the cache line size of the CPU it's running on in C++.

I know that this can't be done portably, so I will need a solution for Linux and another for Windows (Solutions for other systems could be useful to others, so post them if you know them).

For Linux I could read the content of /proc/cpuinfo and parse the line beginning with cache_alignment. Maybe there is a better way involving a call to an API.

For Windows I simply have no idea.

Polak answered 29/9, 2008 at 19:35 Comment(0)

S

21

On Win32, GetLogicalProcessorInformation will give you back a SYSTEM_LOGICAL_PROCESSOR_INFORMATION which contains a CACHE_DESCRIPTOR, which has the information you need.

Seven answered 29/9, 2008 at 19:38 Comment(4)

Yikes - decoding the array of SYSTEM_LOGICAL_PROCESSOR_INFORMATION structures looks like it would be a pain. – Cooperative 29/9, 2008 at 19:48

Welcome to the world of systems programming. ;) – Lyingin 16/5, 2009 at 16:42

It's not too bad, Michael. Anyways, getting to grips with it forces you to learn the how CPU topology is arranged, and you may well need to know. – Autotrophic 20/3, 2011 at 23:9

Woot? No code snippet I can simply copy and paste?!! cries – Demmer 19/2, 2015 at 0:8

C

7

On Linux try the proccpuinfo library, an architecture independent C API for reading /proc/cpuinfo

Cloutman answered 29/9, 2008 at 19:49 Comment(0)

B

5

Looks like at least SCO unix (http://uw714doc.sco.com/en/man/html.3C/sysconf.3C.html) has _SC_CACHE_LINE for sysconf. Perhaps other platforms have something similar?

Bastardy answered 29/9, 2008 at 19:38 Comment(0)

P

5

For x86, the CPUID instruction. A quick google search reveals some libraries for win32 and c++. I have used CPUID via inline assembler as well.

Some more info:

Padegs answered 29/9, 2008 at 19:46 Comment(1)

could you comment on how you'd use CPUID to get this? – Feu 16/5, 2009 at 17:10

M

4

On Windows

#include <Windows.h>
#include <iostream>

using std::cout; using std::endl;

int main()
{
    SYSTEM_INFO systemInfo;
    GetSystemInfo(&systemInfo);
    cout << "Page Size Is: " << systemInfo.dwPageSize;
    getchar();
}

On Linux

http://linux.die.net/man/2/getpagesize

Mien answered 13/9, 2016 at 9:25 Comment(2)

After coming back to this I don't believe I answered your question, which was about the cache line size rather then the memory page size correct? en.wikipedia.org/wiki/Page_(computer_memory) I was googling for a page size snippet (working on a project involving memory access) and came here, the dangers of skimming. Please untick my answer, but probably worth leaving it here for future reference. – Mien 15/9, 2016 at 17:33

Indeed, the question was mistitled with "cache page size". I fixed it. – Dwell 8/2, 2023 at 19:11

L

3

Here is sample code for those who wonder how to to utilize the function in accepted answer:

#include <new>
#include <iostream>
#include <Windows.h>


void ShowCacheSize()
{
    using CPUInfo = SYSTEM_LOGICAL_PROCESSOR_INFORMATION;
    DWORD len = 0;
    CPUInfo* buffer = nullptr;

    // Determine required length of a buffer
    if ((GetLogicalProcessorInformation(buffer, &len) == FALSE) && (GetLastError() == ERROR_INSUFFICIENT_BUFFER))
    {
        // Allocate buffer of required size
        buffer = new (std::nothrow) CPUInfo[len]{ };

        if (buffer == nullptr)
        {
            std::cout << "Buffer allocation of " << len << " bytes failed" << std::endl;
        }
        else if (GetLogicalProcessorInformation(buffer, &len) != FALSE)
        {
            const DWORD count = len / sizeof(CPUInfo);
            for (DWORD i = 0; i < count; ++i)
            {
                // This will be true for multiple returned caches, we need just one
                if (buffer[i].Relationship == RelationCache)
                {
                    std::cout << "Cache line size is: " << buffer[i].Cache.LineSize << " bytes" << std::endl;
                    break;
                }
            }
        }
        else
        {
            std::cout << "ERROR: " << GetLastError() << std::endl;
        }

        delete[] buffer;
    }
}

Lacteous answered 27/5, 2020 at 13:56 Comment(2)

If len is in bytes, shouldn't it be divided by sizeof(CPUInfo) before running through the buffer entries? – Glennaglennie 15/1, 2022 at 21:35

@Glennaglennie thank you for spotting this, you're correct I've updated my sample code. – Lacteous 16/2, 2023 at 11:9

K

0

I think you need NtQuerySystemInformation from ntdll.dll.

Kenyettakenyon answered 29/9, 2008 at 19:45 Comment(0)

D

0

If supported by your implementation, C++17 std::hardware_destructive_interference_size would give you an upper bound (and ..._constructive_... a lower bound), taking into account stuff like hardware prefetch of pairs of lines.

But those are compile-time constants, so can't be correct on all microarchitectures for ISAs which allow different line sizes. (e.g. older x86 CPUs like Pentium III had 32-byte lines, but all later x86 CPUs have used 64-byte lines, including all x86-64. It's theoretically possible that some future microarchitecture will use 128-byte lines, but multi-threaded binaries tuned for 64-byte lines are widespread so that's perhaps unlikely for x86.)

For this reason, some current implementations choose not to implement that C++ feature at all. GCC does implement it, clang doesn't (Godbolt). It becomes part of the ABI when code uses it in struct layouts, so it's not something compilers can change in future to match future CPUs for the same target.

GCC defines both constructive and destructive as 64 x86-64, neglecting the destructive interference that adjacent-line prefetch can cause, e.g. on Intel Sandybridge-family. It's not nearly as disastrous as false sharing within a cache line in a high-contention case, so you might choose to only use 64-byte alignment to separate objects that different threads will be accessing independently.

Should the cache padding size of x86-64 be 128 bytes? - a performance experiment on Skylake showing 500 +- 300 machine clears in an aligned pair of lines, vs. 10M in a single line, vs. near zero in more distant lines. Machine clears were easier to measure than actual cache misses due to losing access to the line.
Understanding std::hardware_destructive_interference_size and std::hardware_constructive_interference_size

Dwell answered 8/2, 2023 at 19:38 Comment(0)

Recommended topics

Hot tags