Why does iterative elementwise array multiplication slow down in numpy?
Asked Answered
P

1

15

The code below reproduces the problem I have encountered in the algorithm I'm currently implementing:

import numpy.random as rand
import time

x = rand.normal(size=(300,50000))
y = rand.normal(size=(300,50000))

for i in range(1000):
    t0 = time.time()
    y *= x
    print "%.4f" % (time.time()-t0)
    y /= y.max() #to prevent overflows

The problem is that after some number of iterations, things start to get gradually slower until one iteration takes multiple times more time than initially.

A plot of the slowdown enter image description here

CPU usage by the Python process is stable around 17-18% the whole time.

I'm using:

  • Python 2.7.4 32-bit version;
  • Numpy 1.7.1 with MKL;
  • Windows 8.
Phenacetin answered 14/5, 2013 at 22:9 Comment(4)
I don't think I see this behavior with python-2.7.4 under Linux.Machree
It's probably due to denormal numbers: https://mcmap.net/q/23947/-why-does-changing-0-1f-to-0-slow-down-performance-by-10xEmergency
In my test run, as soon as it started slowing down, I interrupted it and did print numpy.amin(numpy.abs(y[y != 0])) and got 4.9406564584124654e-324, so I think denormal numbers are your answer. I don't know how to flush denormals to zero from within Python other than creating a C extension though...Emergency
I've been looking around a bit since I have run into this problem several times and I can't seem to find any way to flush the denormals. Kind of frustrating.Perennate
G
4

As @Alok pointed out, this seems to be caused by denormal numbers affecting the performance. I ran it on my OSX system and confirmed the issue. I don't know of a way to flush denormals to zero in numpy. I would try to get around this issue in the algorithm by avoiding the very small numbers: do you really need to be dividing y until it gets down to 1.e-324 level?

If you avoid the low numbers e.g. by adding the following line in your loop:

y += 1e-100

then you'll have a constant time per iteration (albeit slower because of the extra operation). Another workaround is to use higher precision arithmetics, e.g.

x = rand.normal(size=(300,50000)).astype('longdouble')
y = rand.normal(size=(300,50000)).astype('longdouble')

This will make each of your steps more expensive, but each step take roughly the same time.

See the following comparison in my system: enter image description here

Granulite answered 15/5, 2013 at 10:11 Comment(1)
> do you really need to be dividing y until it gets down to 1.e-324 level? Yes, otherwise the values overflow at around 400 iterations. I tried out "adding the small constant" solution in my algorithm (code given in the question is just a simple testcode) and it solved the slowdown issue. Thank you all for your help! (especially @Alok and @tiago)Phenacetin

© 2022 - 2024 — McMap. All rights reserved.