Analytical way of speeding up exp(A*x) in MATLAB

Asked 24/4, 2014 at 9:10 Answered 5/5, 2014 at 14:3

I need to calculate f(x)=exp(A*x) repeatedly for a tiny, variable column vector x and a huge, constant matrix A (many rows, few columns). In other words, the x are few, but the A*x are many. My problem dimensions are such that A*x takes about as much runtime as the exp() part.

Apart from Taylor expansion and pre-calculating a range of values exp(y) (assuming known the range y of values of A*x), which I haven't managed to speed up considerably (while maintaining accuracy) with respect to what MATLAB is doing on its own, I am thinking about analytically restating the problem in order to be able to precalculate some values.

For example, I find that exp(A*x)_i = exp(\sum_j A_ij x_j) = \prod_j exp(A_ij x_j) = \prod_j exp(A_ij)^x_j

This would allow me to precalculate exp(A) once, but the required exponentiation in the loop is as costly as the original exp() function call, and the multiplications (\prod) have to be carried out in addition.

Is there any other idea that I could follow, or solutions within MATLAB that I may have missed?

Edit: some more details

A is 26873856 by 81 in size (yes, it's that huge), so x is 81 by 1. nnz(A) / numel(A) is 0.0012, nnz(A*x) / numel(A*x) is 0.0075. I already use a sparse matrix to represent A, however, exp() of a sparse matrix is not sparse any longer. So in fact, I store x non-sparse and I calculate exp(full(A*x)) which turned out to be as fast/slow as full(exp(A*x)) (I think A*x is non-sparse anyway, since x is non-sparse.) exp(full(A*sparse(x))) is a way to have a sparse A*x, but is slower. Even slower variants are exp(A*sparse(x)) (with doubled memory impact for a non-sparse matrix of type sparse) and full(exp(A*sparse(x)) (which again yields a non-sparse result).

sx = sparse(x);
tic, for i = 1 : 10, exp(full(A*x)); end, toc
tic, for i = 1 : 10, full(exp(A*x)); end, toc
tic, for i = 1 : 10, exp(full(A*sx)); end, toc
tic, for i = 1 : 10, exp(A*sx); end, toc
tic, for i = 1 : 10, full(exp(A*sx)); end, toc

Elapsed time is 1.485935 seconds.
Elapsed time is 1.511304 seconds.
Elapsed time is 2.060104 seconds.
Elapsed time is 3.194711 seconds.
Elapsed time is 4.534749 seconds.

Yes, I do calculate element-wise exp, I update the above equation to reflect that.

One more edit: I tried to be smart, with little success:

tic, for i = 1 : 10, B = exp(A*x); end, toc
tic, for i = 1 : 10, C = 1 + full(spfun(@(x) exp(x) - 1, A * sx)); end, toc
tic, for i = 1 : 10, D = 1 + full(spfun(@(x) exp(x) - 1, A * x)); end, toc
tic, for i = 1 : 10, E = 1 + full(spfun(@(x) exp(x) - 1, sparse(A * x))); end, toc
tic, for i = 1 : 10, F = 1 + spfun(@(x) exp(x) - 1, A * sx); end, toc
tic, for i = 1 : 10, G = 1 + spfun(@(x) exp(x) - 1, A * x); end, toc
tic, for i = 1 : 10, H = 1 + spfun(@(x) exp(x) - 1, sparse(A * x)); end, toc

Elapsed time is 1.490776 seconds.
Elapsed time is 2.031305 seconds.
Elapsed time is 2.743365 seconds.
Elapsed time is 2.818630 seconds.
Elapsed time is 2.176082 seconds.
Elapsed time is 2.779800 seconds.
Elapsed time is 2.900107 seconds.

Diabolo answered 24/4, 2014 at 9:10 Comment(15)

I tried with A = rand(10000, 10). Using expA = exp(A); prod(bsxfun(@power, expA, x'), 2) is indeed much slower. Doing exp(A*x) in a mex function doesn't help much either, even using single-precision. What is the size of A for your usage? – Fortepiano 24/4, 2014 at 11:31

My gut-feeling was saying that this might be simplified by taking the singular value decomposition of A or so, but thinking some more, I do not see how. I think the easiest way to reason is to take the limit that x is a scalar and A a vector. I do not see any way how that can be sped up. – Leontine 24/4, 2014 at 12:51

I tend to agree with previous comments. If many elements are zeros so that A can be considered sparse, this might speed up the process. But a question: you do not try to calculate the matrix exponential, right? – Kinematics 24/4, 2014 at 13:36

can you please explain why is this considered "slow"? what is it that you need to do that 0.15 seconds per calculation is too slow for? – Calvo 24/4, 2014 at 20:40

@natan: It's some new medical image reconstruction process, which basically tries to invert something like y = exp(Ax)*x. While iteratively searching for the solution, I often (some hundreds of thousands times) need to calculate \hat{y} from some current estimate \hat{x}. And since the exp(Ax) contributes some 30-50% of my total computation time, I thought it would make sense to speed it up :) – Diabolo 24/4, 2014 at 21:2

ok, so x changes but A doesn't? is this going in the steps of compressive sensing? – Calvo 24/4, 2014 at 21:22

@natan: yes, x changes and is non-sparse, A is constant and sparse. However, it's not really CS: x represents the image to be reconstructed (reformatted as a vector), A is related to the imaging system matrix (or rather exp(Ax)). So A is no sparsifying transform or anything. Also, due to the size of A, Ax has many more entries than x has, despite the sparsity of A, and so has exp(A*x)*x. – Diabolo 25/4, 2014 at 5:44

Does A have any special structure? – Instancy 28/4, 2014 at 13:36

How about solving w:=ln(y)=A*x+ln(x) instead and taking y=exp(w) at the end? 81 logartithms each step instead of 81*26*10**6 times an exp()-operation may be faster. – Hexapla 28/4, 2014 at 14:23

A has some kind of a special structure: first, it is sparse (see above). Second, I know that if I reshape A to be N1 x N2 x N3 ... x 81 in size, that along some of the dimensions I only have a single non-zero entry. However, all of these dimensions are collapsed, so this is not true any more for the 26873856 by 81 matrix A. In particular, it is not a diagonal or block-diagonal matrix. The comment about ln(y) = Ax + ln(x) looks interesting. I am unsure whether ln(Bx) = ln(B) + ln(x) for matrix-vector products (look at the dimensions), but perhaps this idea can yield something useful later. – Diabolo 29/4, 2014 at 7:13

@bers: I assumed you meant in your comment at 21:02 in matlab notation y=exp(A*x) .* x. Right or misunderstood? – Hexapla 29/4, 2014 at 13:13

This may be beside the point, but as you said you wanted to invert something: If you try to solve a system of equations, make sure to consider using matrix division rather than using the inverse as described in the doc. – Petuu 29/4, 2014 at 13:19

@HorstGrünbusch: Misunderstood :) exp(Ax)*x really is two matrix-vector multiplications; exp(Ax) is reshaped in between to be N by 81. (I can get rid of the reshaping by using a 3D matrix for A which is N by 81 by 81, but then sparsity and matrix multiplications become more difficult in MATLAB.) The only element-wise operations is the exp. – Diabolo 29/4, 2014 at 16:44

Then w(1:N,j)=A*x + log(x(j)) and you have again 82*N function evaluations. OK, last suggestion: Write and compile the routine in Fortran95. It's fast with loops, easy syntax. – Hexapla 29/4, 2014 at 18:22

I think you need to reformulate your image recon problem so you avoid moving between the logarithmic and natural domain. This sounds like a CT problem - attenuation follows an exponential law - and most CT algorithms have figured out how not to do this. – Asteria 30/4, 2014 at 21:19

Computers don't really do exponents. You would think they do, but what they do is high-accuracy polynomial approximations.

References:

The last reference looked quite nice. Perhaps it should have been first.

Since you are working on images, you likely have discrete number of intensity levels (255 typically). This can allow reduced sampling, or lookups, depending on the nature of "A". One way to check this is to do something like the following for a sufficiently representative group of values of "x":

y=Ax
cdfplot(y(:))

If you were able to pre-segment your images into "more interesting" and "not as interesting" - like if you were looking at an x-ray being able to trim out all the "outside the human body" locations and clamp them to zero to pre-sparsify your data, that could reduce your number of unique values. You might consider the previous for each unique "mode" inside the data.

My approaches would include:

look at alternate formulations of exp(x) that are lower accuracy but higher speed
consider table lookups if you have few enough levels of "x"
consider a combination of interpolation and table lookups if you have "slightly too many" levels to do a table lookup
consider a single lookup (or alternate formulation) based on segmented mode. If you know it is a bone and are looking for a vein, then maybe it should get less high-cost data processing applied.

Now I have to ask myself why would you be living in so many iterations of exp(A*x)*x and I think you might be switching back and forth between frequency/wavenumber domain and time/space domain. You also might be dealing with probabilities using exp(x) as a basis, and doing some Bayesian fun. I don't know that exp(x) is a good conjugate prior, so I'm going to go with the fourier material.

Other options: - consider use of fft, fft2, or fftn given your matrices - they are fast and might do part of what you are looking for.

I am sure there is a forier domain variation on the following:

You might be able to mix the lookup with a compute using the woodbury matrix. I would have to think about that some to be sure though. (link) At one point I knew that everything that mattered (CFD, FEA, FFT) were all about the matrix inversion, but I have since forgotten the particular details.

Now, if you are living in MatLab then you might consider using "coder" which converts MatLab code to c-code. No matter how much fun an interpreter may be, a good c-compiler can be a lot faster. The mnemonic (hopefully not too ambitious) that I use is shown here: link starting around 13:49. It is really simple, but it shows the difference between a canonical interpreted language (python) and compiled version of the same (cython/c).

I'm sure that if I had some more specifics, and was requested to, then I could engage more aggressively in a more specifically relevant answer.

You might not have a good way to do it on conventional hardware, buy you might consider something like a GPGPU. CUDA and its peers have massively parallel operations that allow substantial speedup for the cost of a few video cards. You can have thousands of "cores" (overglorified pipelines) doing the work of a few ALU's and if the job is properly parallelizable (as this looks like) then it can get done a LOT faster.

EDIT:

I was thinking about Eureqa. One option that I would consider if I had some "big iron" for development but not production would be to use their Eureqa product to come up with a fast enough, accurate enough approximation.

If you performed a 'quick' singular value decomposition of your "A" matrix, you would find that the dominant performance is governed by 81 eigenvectors. I would look at the eigenvalues and see if there were only a few of those 81 eigenvectors providing the majority of the information. If that was the case, then you can clamp the others to zero, and construct a simple transformation.

Now, if it were me, I would want to get "A" out of the exponent. I'm wondering if you can look at the 81x81 eigenvector matrix and "x" and think a little about linear algebra, and what space you are projecting your vectors into. Is there any way that you can make a function that looks like the following:

f(x) = B2 * exp( B1 * x )

such that the

B1 * x

is much smaller rank than your current

Ax.

Stibine answered 5/5, 2014 at 14:3 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags