python multiprocessing: no diminishing returns?

Asked 23/1, 2016 at 15:10 Answered 17/2, 2016 at 11:59

Solved python multithreading multiprocess

Let's say I want to paralelize some intensive computation (not I/O bound).

Naturally, I do not want to run more processes than available processors or I would start paying for context switching (and cache misses).

Mentally, I would expect that as I increased n in multiprocessing.Pool(n), total time would behave like this:

negative slope as tasks take advantage of parallelization
positive slope as context switching starts costing me
plateau

But in actuality, I am getting this:

#!/usr/bin/env python

from math import factorial


def pi(n):
    t = 0
    pi = 0
    deno = 0
    k = 0
    for k in range(n):
        t = ((-1)**k)*(factorial(6*k))*(13591409+545140134*k)
        deno = factorial(3*k)*(factorial(k)**3)*(640320**(3*k))
        pi += t/deno
    pi = pi * 12/(640320**(1.5))
    pi = 1/pi
    return pi

import multiprocessing
import time
maxx = 20
tasks = 60
task_complexity = 500
x = range(1, maxx+1)
y = [0]*maxx

for i in x:
    p = multiprocessing.Pool(i)
    tic = time.time()
    p.map(pi, [task_complexity]*tasks)
    toc = time.time()
    y[i-1] = toc-tic
    print '%2d %ds' % (i, y[i-1])

import matplotlib.pyplot as plot
plot.plot(x, y)
plot.xlabel('Number of threads')
plot.xlim(1, maxx)
plot.xticks(x)
plot.ylabel('Time in seconds')
plot.show()

My machine: i3-3217U CPU @ 1.80GHz × 4

Operating system: Ubuntu 14.04

After n>4, I see the task manager rotating through the various processes, as expected since there are more processes than processors. Yet, there is no penalty relative to n=4 (my number of processors).

In fact, even when n<4, I see the scheduler frenetically rotating the processes through my processors, instead of assigning each process to its own processor and avoid context switching.

I am seeing this behavior using gnome-system-monitor: (Please let me know if someone has a different experience.)

Any explanation why it does not seem to matter how many processes I fire? Or is something wrong with my code?

My guess: it seems to be the case that processes are not processor-bound (even when only two processes are active, they keep switching CPU), and so I am paying for context switching anyway.

References:

multiprocessing module

EDIT: updated graphic and code with higher constants.

Glimp answered 23/1, 2016 at 15:10 Comment(4)

Out of interest: How many processor threads do you have available? – Staub 23/1, 2016 at 15:15

@poke, I forgot to mention. :) I have added that and other information in the meantime. – Glimp 23/1, 2016 at 15:40

Just for clarity: The i3-3217U has 2 cores with 2 core threads each. You should probably also measure the cumulated calculation time to see better how context switches impact your calculation, here I've posted a modified version of your script (without the plotting stuff as I don't have matplotlib installed right now). – Committeeman 25/1, 2016 at 16:42

@Committeeman I will report back as soon as I can. Anyhow, if anyone can try this in other machines and operating system, it would also be interesting. I will validate an answer that does something like that. – Glimp 27/1, 2016 at 15:25

In fact, even when n<4, I see the scheduler frenetically rotating the processes through my processors, instead of assigning each process to its own processor and avoid context switching.

Processes are not processor-bounded by default, one of the main reason being to avoid unequal heating of the processor, which can cause mechanical stress and reduce its lifetime.

There are ways to enforce running a process on a single core (look at psutil module), which has advantages such as better use of cache memory and avoid context switching, but in most cases (if not all), you don't make a big difference in terms of performances.

So now if spawn more processes than your number of cores, they will just act as threads and switch between them to optimize execution. The processor performance will only be (very) slightly lowered, as you already were switching context with less than 4 processes.

Publisher answered 26/1, 2016 at 12:31 Comment(5)

I like your heating explanation, and I think it's part of the story, but I don't quite buy it. They could avoid heating by changing cores every 30 minutes or so, not every second or less. And I am told that in Windows this constant processor switching does not happen. Anyhow, I will mark as correct if nobody has any other explanations. ps: I have added a screenshot of the tool I am using to see this behavior, in case someone can corroborate it... – Glimp 27/1, 2016 at 15:16

Thank about mentioning psutil; when I find some time, I will change the code to see if I can improve performance. For future reference, the taskset(1) command also looks interesting. – Glimp 27/1, 2016 at 15:23

there are multiple algorithm (os dependant) to schedule tasks on a processor. I look for a better answer than the heating issue, but I don't find much on WHY is there a need for process-switching for single thread programs. I found an article describing a possible performance improvement with context-switching algorithm, but I have no idea if this is used or only research material. – Publisher 27/1, 2016 at 16:1

However I think a fully-loaded core heats very very fast : If you start an heavy task, you can hear the fans speed up in a matter of seconds. I think waiting 30 minutes will be far too long. – Publisher 27/1, 2016 at 16:4

I am only recently working with heavy loads. I have been watching more closely how the scheduler behaves and it seems I was erroneous. It isn't true that processes keep changing cores at random. Each process only seems to be changing between two cores. My computer has four physical cores and each has two "soft" hyperthreading cores. I think what is changing cores is not the Linux scheduler, but this apparent core change is an artifact from the architecture. There does not seem to exist unnecessary context switching. – Glimp 17/2, 2016 at 12:3

Answering my own question:

Firstly, I seem to have committed an error in my post. It does not seem true that the CPU being used gets changed frantically. If I fire two CPU-intensive processes, they keep changing cores but only between two cores. My computer has 4 cores each of which has 2 "soft" cores (for hyperthreading). I guess what is going on is that it is changing between these 2 "soft" cores. It isn't Linux doing this, it is the CPU-board.

That being said, I am still surprised that context switching is not more of a pain than it is.

EDIT: There is a nice discussion, with better empirical work than me, over this blog.

Glimp answered 17/2, 2016 at 11:59 Comment(0)

Recommended topics

Hot tags