Programmatically get the cache line size?
Asked Answered
T

9

203

All platforms welcome, please specify the platform for your answer.

A similar question: How to programmatically get the CPU cache page size in C++?

Tennes answered 27/4, 2009 at 18:17 Comment(3)
FWIW, C++17 will provide a compile-time approximation of this: #39680706Ejaculation
aside for C/C++, if you won't mind using assembly to get such info, you can take a look (expanding info from negamartin's answer) at SDL2's source code of SDL_GetCPUCacheLineSize function, then take a look at cpuid macro which has assembly source code for each of processor model. You can take a look at imgur.com/a/KP57m6s, or directly peek at the source yourself.Steerage
gist.github.com/jesstess/797876 Code to get the cache and cache line sizes.Almsgiver
K
9

You can use std::hardware_destructive_interference_size since C++17.
Its defined as:

Minimum offset between two objects to avoid false sharing. Guaranteed to be at least alignof(std::max_align_t)

Kerstinkerwin answered 27/4, 2009 at 18:17 Comment(0)
B
219

On Linux (with a reasonably recent kernel), you can get this information out of /sys:

/sys/devices/system/cpu/cpu0/cache/

This directory has a subdirectory for each level of cache. Each of those directories contains the following files:

coherency_line_size
level
number_of_sets
physical_line_partition
shared_cpu_list
shared_cpu_map
size
type
ways_of_associativity

This gives you more information about the cache then you'd ever hope to know, including the cacheline size (coherency_line_size) as well as what CPUs share this cache. This is very useful if you are doing multithreaded programming with shared data (you'll get better results if the threads sharing data are also sharing a cache).

Benco answered 28/4, 2009 at 16:10 Comment(7)
which of the files contains the cache line size? I'm assuming the coherency_line_size? or the physical_line_partition?Monogenic
To be sure: this is in Bytes, yes?Antipus
Yes, coherency_line_size is in bytes.Valenzuela
Could you post sample values of coherency_line_size as seen on your machines? For those who don't have Linux at hand.Anastasiaanastasie
@android : I use fedora-18 x64 machine with core-i5 processor. cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size returns 64 in my system. Same for index1,2,3 folders also.Marceau
This technique doesn't work on "Windows Subsystem for Linux", but sysconf(_SC_LEVEL1_DCACHE_LINESIZE) works on WSL.Lablab
@AbidRahmanK so there is four levels of cache? L0, L1, L2, L3?Fachini
A
193

On Linux look at sysconf(3).

sysconf (_SC_LEVEL1_DCACHE_LINESIZE)

You can also get it from the command line using getconf:

$ getconf LEVEL1_DCACHE_LINESIZE
64
Auxiliaries answered 27/4, 2009 at 18:17 Comment(4)
simple answers are just the best !Zigzag
@warunapww It is in bytes.Decrepitate
finally! hope more guys see this answer for time saving.Corvette
Is this available on glibc/musl/uClibc? It seems that musl does not have it definedJagir
F
128

I have been working on some cache line stuff and needed to write a cross-platform function. I committed it to a github repo at https://github.com/NickStrupat/CacheLineSize, or you can just use the source below. Feel free to do whatever you want with it.

#ifndef GET_CACHE_LINE_SIZE_H_INCLUDED
#define GET_CACHE_LINE_SIZE_H_INCLUDED

// Author: Nick Strupat
// Date: October 29, 2010
// Returns the cache line size (in bytes) of the processor, or 0 on failure

#include <stddef.h>
size_t cache_line_size();

#if defined(__APPLE__)

#include <sys/sysctl.h>
size_t cache_line_size() {
    size_t line_size = 0;
    size_t sizeof_line_size = sizeof(line_size);
    sysctlbyname("hw.cachelinesize", &line_size, &sizeof_line_size, 0, 0);
    return line_size;
}

#elif defined(_WIN32)

#include <stdlib.h>
#include <windows.h>
size_t cache_line_size() {
    size_t line_size = 0;
    DWORD buffer_size = 0;
    DWORD i = 0;
    SYSTEM_LOGICAL_PROCESSOR_INFORMATION * buffer = 0;

    GetLogicalProcessorInformation(0, &buffer_size);
    buffer = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION *)malloc(buffer_size);
    GetLogicalProcessorInformation(&buffer[0], &buffer_size);

    for (i = 0; i != buffer_size / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); ++i) {
        if (buffer[i].Relationship == RelationCache && buffer[i].Cache.Level == 1) {
            line_size = buffer[i].Cache.LineSize;
            break;
        }
    }

    free(buffer);
    return line_size;
}

#elif defined(linux)

#include <stdio.h>
size_t cache_line_size() {
    FILE * p = 0;
    p = fopen("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", "r");
    unsigned int i = 0;
    if (p) {
        fscanf(p, "%d", &i);
        fclose(p);
    }
    return i;
}

#else
#error Unrecognized platform
#endif

#endif
Fletafletch answered 27/4, 2009 at 18:17 Comment(2)
Might be better to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE) for linux.Steelwork
@Matt why? Just curious :-).Beem
B
33

On x86, you can use the CPUID instruction with function 2 to determine various properties of the cache and the TLB. Parsing the output of function 2 is somewhat complicated, so I'll refer you to section 3.1.3 of the Intel Processor Identification and the CPUID Instruction (PDF).

To get this data from C/C++ code, you'll need to use inline assembly, compiler intrinsics, or call an external assembly function to perform the CPUID instruction.

Bradlybradman answered 27/4, 2009 at 18:28 Comment(2)
anyone know about how to do this with other processors with built in cache?Monogenic
@ceretullis: Errr... the x86 has built in cache. What "other processors" are you specifically looking for? What you're asking for is platform dependent.Campanulaceous
D
11

If you're using SDL2 you can use this function:

int SDL_GetCPUCacheLineSize(void);

Which returns the size of the L1 cache line size, in bytes.

In my x86_64 machine, running this code snippet:

printf("CacheLineSize = %d",SDL_GetCPUCacheLineSize());

Produces CacheLineSize = 64

I know I'm a little late, but just adding information for future visitors. The SDL documentation currently says the number returned is in KB, but it is actually in bytes.

Domineca answered 27/4, 2009 at 18:17 Comment(1)
Oh my this is really helpful. I'm going to write some game in SDL2 so this is going to be really usefulMisshapen
K
9

You can use std::hardware_destructive_interference_size since C++17.
Its defined as:

Minimum offset between two objects to avoid false sharing. Guaranteed to be at least alignof(std::max_align_t)

Kerstinkerwin answered 27/4, 2009 at 18:17 Comment(0)
N
8

On the Windows platform:

from https://devblogs.microsoft.com/oldnewthing/20091208-01/?p=15733

The GetLogicalProcessorInformation function will give you characteristics of the logical processors in use by the system. You can walk the SYSTEM_LOGICAL_PROCESSOR_INFORMATION returned by the function looking for entries of type RelationCache. Each such entry contains a ProcessorMask which tells you which processor(s) the entry applies to, and in the CACHE_DESCRIPTOR, it tells you what type of cache is being described and how big the cache line is for that cache.

Nanine answered 14/12, 2009 at 11:50 Comment(0)
C
4

ARMv6 and above has C0 or the Cache Type Register. However, its only available in privileged mode.

For example, from Cortex™-A8 Technical Reference Manual:

The purpose of the Cache Type Register is to determine the instruction and data cache minimum line length in bytes to enable a range of addresses to be invalidated.

The Cache Type Register is:

  • a read-only register
  • accessible in privileged modes only.

The contents of the Cache Type Register depend on the specific implementation. Figure 3-2 shows the bit arrangement of the Cache Type Register...


Don't assume the ARM processor has a cache (apparently, some can be configured without one). The standard way to determine it is via C0. From the ARM ARM, page B6-6:

From ARMv6, the System Control Coprocessor Cache Type register is the mandated method to define the L1 caches, see Cache Type register on page B6-14. It is also the recommended method for earlier variants of the architecture. In addition, Considerations for additional levels of cache on page B6-12 describes architecture guidelines for level 2 cache support.

Chicoine answered 27/4, 2009 at 18:17 Comment(0)
E
3

You can also try to do it programmatically by measuring some timing. Obviously, it won't always be as precise as cpuid and the likes, but it is more portable. ATLAS does it at its configuration stage, you may want to look at it:

http://math-atlas.sourceforge.net/

Eldreeda answered 28/4, 2009 at 13:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.