Numpy fusing multiply and add to avoid wasting memory
Asked Answered
S

3

10

Is it possible to multiply two ndarray A, and B and add the result to C, without creating a large intermediate array for A times B?

Numpy has the out keyword parameter for the case of C = A times B:

numpy.multiply(A, B, out=C)

How about the case of C += A times B?

Subterranean answered 19/7, 2017 at 20:19 Comment(5)
Then use numpy.add(... out = C)? No extra array created that way.Remington
@Remington that still creates an intermediate for A times BSubterranean
Are you worried about memory or performance or both? In the comments and question being tagged as memory suggests its memory that you are focussing on. If that's the case, numpy.add uses no extra memory, as mentioned earlier as well.Remington
Are you hitting memory errors? Otherwise I question whether it's worth your time to micromanage numpy's use of memory. If it's speed you are worried about, try several options on realistic arrays and see it they make any difference.Stockbroker
Writing and commenting on SO takes time. Yes, it is out of memory error. I really did a calculation on #elements multiplied by 1e-9 and 4 to get memory used in Gb, and know the machine only has 8Gb memory... It sucks.Subterranean
A
15

Numpy only supports operations one at a time. With that said, there are several workarounds.

In place operations

The most simple solution is to use in-place operations via += and *=

import numpy as np

n = 100
b = 5.0

x = np.random.rand(n)
y = np.random.rand(n)

z = b * x
z += y

BLAS

You can access the underlying BLAS programs and apply them manually. Sadly, there is no multiply add instruction, but there is the "AXPY" instruction, which performs

y <- a * x + y

This can be called via:

import scipy

axpy = scipy.linalg.blas.get_blas_funcs('axpy', arrays=(x, y))
axpy(x, y, n, b)

Numexpr

Another option is to use some package like numexpr which allows you to compile expressions:

import numexpr

z = numexpr.evaluate('b * x + y')

Theano

Recently several machine-learning packages have started supporting compiled expressions, one such package is theano. You could do something like:

import theano

x = theano.tensor.vector()         # declare variable
y = theano.tensor.vector()         # declare variable

out = b * x + y                    # build symbolic expression
f = theano.function([x, y], out)   # compile function

z = f(x, y)
Acknowledgment answered 19/7, 2017 at 20:33 Comment(7)
linalg is nice, but not for numpy array with dimension > 2. Will try numexpr. Hope that works.Subterranean
Please report back with status! Added some other options as wellAcknowledgment
i guess numpy could have an add_to keyword. But I haven't read the code base and don't know if adding that feature is difficult.Subterranean
Numpy does not have a add-to keyword sadlyAcknowledgment
it works. import numpy as np; import numexpr as ne; a = np.ones(int(1.9e9),dtype=np.int32).reshape((2,4,5,-1)); print(ne.evaluate("a*a+a",out=a)) didn't use swap.Subterranean
Does += actually solve the problem? I believe it still creates a temporary result on the RHS of the operation.Baptistry
It does, but it helps in related cases. Numexpr style stuff is the only full solution here.Acknowledgment
G
7

I compared different variants and found that you're not going wrong with SciPy's BLAS interface

scipy.linalg.blas.daxpy(x, y, len(x), a)

enter image description here


Code to reproduce the plot:

import numexpr
import numpy as np
import perfplot
import scipy.linalg
import theano

a = 1.36

# theano preps
x = theano.tensor.vector()
y = theano.tensor.vector()
out = a * x + y
f = theano.function([x, y], out)


def setup(n):
    x = np.random.rand(n)
    y = np.random.rand(n)
    return x, y


def manual_axpy(data):
    x, y = data
    return a * x + y


def manual_axpy_inplace(data):
    x, y = data
    out = a * x
    out += y
    return out


def scipy_axpy(data):
    x, y = data
    n = len(x)
    axpy = scipy.linalg.blas.get_blas_funcs("axpy", arrays=(x, y))
    axpy(x, y, n, a)
    return y


def scipy_daxpy(data):
    x, y = data
    return scipy.linalg.blas.daxpy(x, y, len(x), a)


def numpexpr_evaluate(data):
    x, y = data
    return numexpr.evaluate("a * x + y")


def theano_function(data):
    x, y = data
    return f(x, y)


b = perfplot.bench(
    setup=setup,
    kernels=[
        manual_axpy,
        manual_axpy_inplace,
        scipy_axpy,
        scipy_daxpy,
        numpexpr_evaluate,
        theano_function,
    ],
    n_range=[2 ** k for k in range(24)],
    equality_check=None,
    xlabel="len(x), len(y)",
)
# b.save("out.png")
b.show()
Generic answered 2/5, 2021 at 16:31 Comment(0)
B
-1

As far as I understand, NumPy array operations can only be done one at a time, but by putting it inside of a function you can ensure it is not in memory, as the commenter suggested.

Bloodhound answered 19/7, 2017 at 20:23 Comment(8)
erh. I guess we are talking about numpy.add(A*B, out=C). That needs memory to hold A times B, and memory to hold the original value of C. That add function alone doesn't do multiplication. A times B will be evaluated by python interpreter first, and put into an intermediate array. After that, the intermediate is added to C.Subterranean
Yes, but then A times B would no longer be in memory.Bloodhound
Well. When the intermediate exists, we need memory. If there isn't enough memory for both C and the intermediate at that point, the program will get out of memory error. And not enough memory is the problem.Subterranean
You have a case where you cannot hold the multiple of two matrices, but you can hold those two added to a third matrix with the same size? Doesn't seem probable to me.Bloodhound
Yes. Outer product of two 3 element vector gives a 9 element array. A and B are much smaller than C, but A*B and C have same size. I have enough memory to hold A, B, and C, but not enough to hold A, B, C, and the intermediate.Subterranean
we shouldn't need that intermediate, but if numpy creates that intermediate first before adding the whole intermeddiate array to C. Well, that is a waste of memory.Subterranean
is there a way to do this operation mathematically without an intermediate? Then you can do the same thing.Bloodhound
the point of using a library is not to do everything yourself. doing everything yourself is the last resort.Subterranean

© 2022 - 2024 — McMap. All rights reserved.