Why does iterative elementwise array multiplication slow down in numpy?

About

Asked 14/5, 2013 at 22:9 Answered 15/5, 2013 at 10:11

The code below reproduces the problem I have encountered in the algorithm I'm currently implementing:

import numpy.random as rand
import time

x = rand.normal(size=(300,50000))
y = rand.normal(size=(300,50000))

for i in range(1000):
    t0 = time.time()
    y *= x
    print "%.4f" % (time.time()-t0)
    y /= y.max() #to prevent overflows

The problem is that after some number of iterations, things start to get gradually slower until one iteration takes multiple times more time than initially.

A plot of the slowdown enter image description here

CPU usage by the Python process is stable around 17-18% the whole time.

I'm using:

Python 2.7.4 32-bit version;
Numpy 1.7.1 with MKL;
Windows 8.

Phenacetin answered 14/5, 2013 at 22:9 Comment(4)

I don't think I see this behavior with python-2.7.4 under Linux. – Machree 14/5, 2013 at 22:20

It's probably due to denormal numbers: https://mcmap.net/q/23947/-why-does-changing-0-1f-to-0-slow-down-performance-by-10x – Emergency 14/5, 2013 at 22:40

In my test run, as soon as it started slowing down, I interrupted it and did print numpy.amin(numpy.abs(y[y != 0])) and got 4.9406564584124654e-324, so I think denormal numbers are your answer. I don't know how to flush denormals to zero from within Python other than creating a C extension though... – Emergency 14/5, 2013 at 22:43

I've been looking around a bit since I have run into this problem several times and I can't seem to find any way to flush the denormals. Kind of frustrating. – Perennate 14/5, 2013 at 23:10

As @Alok pointed out, this seems to be caused by denormal numbers affecting the performance. I ran it on my OSX system and confirmed the issue. I don't know of a way to flush denormals to zero in numpy. I would try to get around this issue in the algorithm by avoiding the very small numbers: do you really need to be dividing y until it gets down to 1.e-324 level?

If you avoid the low numbers e.g. by adding the following line in your loop:

y += 1e-100

then you'll have a constant time per iteration (albeit slower because of the extra operation). Another workaround is to use higher precision arithmetics, e.g.

x = rand.normal(size=(300,50000)).astype('longdouble')
y = rand.normal(size=(300,50000)).astype('longdouble')

This will make each of your steps more expensive, but each step take roughly the same time.

See the following comparison in my system: enter image description here

Granulite answered 15/5, 2013 at 10:11 Comment(1)

> do you really need to be dividing y until it gets down to 1.e-324 level? Yes, otherwise the values overflow at around 400 iterations. I tried out "adding the small constant" solution in my algorithm (code given in the question is just a simple testcode) and it solved the slowdown issue. Thank you all for your help! (especially @Alok and @tiago) – Phenacetin 16/5, 2013 at 8:24

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags