Why is the execution of 2 threads slower than that of 1 thread in C?
Asked Answered
D

2

4

I am using the pthread library to make a program to find the accurate value of pi with Leibniz formula. I am working on shared resource here. My multithreaded function looks like this:

void *Leibniz(void *vars)
{
    struct variables *val = (struct variables *)vars;
    int startLimit = val->start;
    int endLimit = val->end;
    
    int i;
    for (i = startLimit; i <= endLimit; i++)
    {
        pthread_mutex_lock(&mutex);
        sum += (pow(-1, i) / ((2 * i) + 1));
        pthread_mutex_unlock(&mutex);
    }
}

When I run the program with N iterations and 1 thread, I get the correct output in about 4.5 seconds average. When I run the same program with two threads, it takes around 18 seconds. I have to use multithreading to make the program faster but the exact opposite is happening. Can anyone explain why?

Ductile answered 4/2, 2021 at 10:10 Comment(8)
You do not let the threads run separately. You force them to repeatedly interact and block each other through a lock.Dziggetai
If you need to grab the mutex around the entire loop body, you won't get any benefit from threading.Tuatara
@EricPostpischil There, sum is a shared resource. If I don't block and unblock, the result will be more inaccurate. What is your proposed solution?Alodee
What you should do is give each thread its own sum variable that it can update. When all the threads finish, you add up all those sums.Tuatara
@Tuatara Thank you. I will try that.Alodee
That is, expand the struct variables with a sum member. Then add those together in the end of the thread creator after joining all threads.Mediterranean
If your program takes 4.5 seconds in the single-thread case, I expect you are accumulating so many terms that most of them will be so small they do not contribute to the result due to the limited precision of the floating-point format.Dziggetai
@EricPostpischil good point — reversing order of accumulation (unless limits are negative) and summing odd/even terms separately would massively improve accuracyPina
U
2

You use locks to ensure that sum += (pow(-1, i) / ((2 * i) + 1)); is calculated in exactly one thread at a time. Multi threading can potentially be faster only when multiple threads do work at the same time.

Mutexes and thread creation itself are costly which is why the multi threaded non-parallel program is slower than single threaded one.

What is your proposed solution?

Don't have shared resources. In this case, have separate sum for each thread, then sum the sums in the end. Divide and conquer.

Unbridle answered 4/2, 2021 at 10:16 Comment(0)
M
2

It looks like you aren't expressing what you thought.

Instead of locking on each loop iteration (which degrades performance due to many context switches), what you probably wanted is updating the sum at the end of calculation:

(note: only 1 lock needed when updating the shared sum):

{
    struct variables *val = (struct variables *)vars;
    int startLimit = val->start;
    int endLimit = val->end;

    // note: this part is calculated by each thread separatly
    // no (expensive) locking here

    double local_sum = 0;
    for (int i = startLimit; i <= endLimit; i++)
    {
         local_sum += (pow(-1, i) / ((2 * i) + 1));
    }

    // now update the sum (which is shared among all threads)
    // so need some kind of synchronization here
    // -> only 1 lock instead of (endLimit - startLimit + 1) times locking
    pthread_mutex_lock(&mutex);
    sum += local_sum;
    pthread_mutex_unlock(&mutex);
}
Margemargeaux answered 4/2, 2021 at 10:19 Comment(5)
Thank you! This drastically improved the performance! But I have a question. What if I was printing on a file instead of adding on a local variable? What do you suggest then?Alodee
It depends on what exactly do you want to write to a file: the whole sum at the end, or the partial sums, calculated by each thread ?Margemargeaux
I wanted the partial sums by each thread on a file.Alodee
Well, you probably don't want to lock on each write, because of much slower file I/O. What you can do, put a sums to a shared vector for example, and then flush this out to a fileMargemargeaux
Thank you. I will try that. :)Alodee

© 2022 - 2024 — McMap. All rights reserved.