False sharing and stack variables

Asked 26/7, 2010 at 6:57 Answered 26/7, 2010 at 22:55

I have small but frequently used function objects. Each thread gets its own copy. Everything is allocated statically. Copies don't share any global or static data. Do I need to protect this objects from false sharing?

Thank you. EDIT: Here is a toy program which uses Boost.Threads. Can false sharing occur for the field data?

#include <boost/thread/thread.hpp>

struct Work {
    void operator()() {
        ++data;
    }

    int data;
};

int main() {
    boost::thread_group threads;
    for (int i = 0; i < 10; ++i)
        threads.create_thread(Work());
    threads.join_all();
}

Landlord answered 26/7, 2010 at 6:57 Comment(3)

Code would work better. If your function objects have static data, then all the threads will share that data. – Exhortative 26/7, 2010 at 6:58

Think you need to tell exactly what you mean by "each thread gets it's own copy" and "allocated statically". Do threads use each others copy? – Jackie 26/7, 2010 at 7:33

@Elemental: Some compilers can use TLS- thread local storage. This means that you can allocate statically AND thread-safely, although such is slow. – Coadjutress 26/7, 2010 at 7:38

False sharing between threads is when 2 or more threads use the same cache line.

E.g. :

struct Work {
    Work( int& d) : data( d ) {}
    void operator()() {
        ++data;
    }

    int& data;
};

int main() {
    int false_sharing[10] = { 0 };
    boost::thread_group threads;
    for (int i = 0; i < 10; ++i)
        threads.create_thread(Work(false_sharing[i]));
    threads.join_all();

    int no_false_sharing[10 * CACHELINE_SIZE_INTS] = { 0 };
    for (int i = 0; i < 10; ++i)
        threads.create_thread(Work(no_false_sharing[i * CACHELINE_SIZE_INTS]));
    threads.join_all();
}

The threads in the first block do suffer from false sharing. The threads in the second block do not (thanks to CACHELINE_SIZE).

Data on the stack is always 'far' away from other threads. (E.g. under windows, at least a couple of pages).

With your definition of a function object, false sharing can appear, because the instances of Work get created on the heap and this heap space is used inside the thread.

This may lead to several Work instances to be adjacent and so may incur sharing of cache lines.

But ... your sample does not make sense, because data is never touched outside and so false sharing is induced needlessly.

The easiest way, to prevent problems like this, is to copy your 'shared' data locally on tho the stack, and then work on the stack copy. When your work is finished copy it back to the output var.

E.g:

struct Work {
    Work( int& d) : data( d ) {}
    void operator()()
    {
        int tmp = data;
        for( int i = 0; i < lengthy_op; ++i )
           ++tmp;
        data = tmp;
    }

    int& data;
};

This prevents all problems with sharing.

Olodort answered 26/7, 2010 at 9:55 Comment(5)

Are you saying that data can be affected by false sharing? In my case copying it into the function's stack won't help, because the function itself must be called frequently and uses data only once per call. – Landlord 26/7, 2010 at 11:21

When the function must be called very frequently, it doesn't make sense to create a thread every time. Either you do much work in a new thread, or you just burn cycles for thread creation/destruction. And in the later case, you overshadow the cost of false sharing by the enormous costs of the threads. – Olodort 26/7, 2010 at 11:47

Nevertheless. If you cannot copy data onto the stack for your operation, then just make 'Work' large enough to be at least CACHLINE_SIZE long. You lose a couple of bytes, but you can really be sure to never run into false sharing problems. – Olodort 26/7, 2010 at 11:48

Each thread calls that function many-many times before the work is done. I didn't show any loops for brevity. Each function can be executed independently.That is why I want to use multi-threading. – Landlord 26/7, 2010 at 11:56

Then the answer degrades even more to 'it depends'. If you control the allocation of the function objects, then you can easily prevent them from using the same cache line. If you cannot control it, you should pass the 'data' fragments as references and control the allocation on your side. If you cannot do that, then enlarge the structure with 'unused' bytes, to enforce cache-line separation. If you don't know, try out one of these schemes. – Olodort 26/7, 2010 at 12:37

I did a fair bit of research and it seems there is no silver bullet solution to false sharing. Here is what I come up with (thanks to Christopher): 1) Pad your data from both sides with unused or less frequently used stuff. 2) Copy your data into stack and copy it back after all hard work is done. 3) Use cache aligned memory allocation.

Landlord answered 26/7, 2010 at 22:55 Comment(0)

I' don't feel entirely safe with the details, but here's my take:

(1) Your simplified example is broken since boost create_thread expects a reference, you pass a temporary.

(2) if you'd use vector<Work> with one item fro each thread, or othrwise have them in memory sequentially, false sharing will occur.

Verney answered 26/7, 2010 at 9:53 Comment(5)

(1) No, it isn't broken. create_thread accepts its argument by value. Check the declaration if you don't believe me. (2) I clearly stated that each thread get's its own copy. Check the code. The function object is passed by value. – Landlord 26/7, 2010 at 11:11

It is. Work is not copied into the target stack. It is 'newed' in the context of 'create_thread' and only a (shared-)pointer is transfered onto the target stack. There the data is only referenced by a pointer. (I tested this, by assigning the thread_id to the data member, and then looking at the value in the operator() call.) – Olodort 26/7, 2010 at 11:51

(1) we are talking about: thread* create_thread(const boost::function0<void>& threadfunc); ? That's what I found when trying to checkthe reference – Verney 26/7, 2010 at 14:45

It is declared as template<typename F> thread* create_thread(F threadfunc); Which version of boost are you talking about? – Landlord 26/7, 2010 at 20:49

boost docs for 1.32 would spit out only the creator as posted above, the soruce code shows your prototype. strange. As the new thread object with its threadinfo is heap-allocated, there is a decent chance. of them to end up in one cache line. Just modify yur example to spit out the adresses of &data. – Verney 27/7, 2010 at 5:33

Recommended topics

Hot tags