Mathematica running out of memory
Asked Answered
S

1

19

I'm trying to run the following program, which calculates roots of polynomials of degree up to d with coefficients only +1 or -1, and then store it into files.

d = 20; n = 18000; 
f[z_, i_] := Sum[(2 Mod[Floor[(i - 1)/2^k], 2] - 1) z^(d - k), {k, 0, d}];

Here f[z,i] gives a polynomial in z with plus or minus signs counting in binary. Say d=2, we would have

f[z,1] = -z2 - z - 1
f[z,2] = -z2 - z + 1
f[z,3] = -z2 + z - 1
f[z,4] = -z2 + z + 1

DistributeDefinitions[d, n, f]

ParallelDo[ 
            Do[ 
                     root = N[Root[f[z, i], j]];
                     {a, b} = Round[n ({Re[root], Im[root]}/1.5 + 1)/2];
            {i, 1, 2^d}],
{j, 1, d}]

I realise reading this probably isn't too enjoyable, but it's relatively short anyway. I would've tried to cut down to the relevant parts, but here I really have no clue what the trouble is. I'm calculating all roots of f[z,i], and then just round them to make them correspond to a point in a n by n grid, and save that data in various files.

For some reason, the memory usage in Mathematica creeps up until it fills all the memory (6 GB on this machine); then the computation continues extremely slowly; why is this?

I am not sure what is using up the memory here - my only guess was the stream of files used up memory, but that's not the case: I tried appending data to 2GB files and there was no noticeable memory usage for that. There seems to be absolutely no reason for Mathematica to be using large amounts of memory here.

For small values of d (15 for example), the behaviour is the following: I have 4 kernels running. As they all run through the ParallelDo loop (each doing a value of j at a time), the memory use increases, until they all finish going through that loop once. Then the next times they go through that loop, the memory use does not increase at all. The calculation eventually finishes and everything is fine.

Also, quite importantly, once the calculation stops, the memory use does not go back down. If I start another calculation, the following happens:

-If the previous calculation stopped when memory use was still increasing, it continues to increase (it might take a while to start increasing again, basically to get to the same point in the computation).

-If the previous calculation stopped when memory use was not increasing, it does not increase further.

Edit: The issue seems to come from the relative complexity of f - changing it into some easier polynomial seems to fix the issue. I thought the problem might be that Mathematica remembers f[z,i] for specific values of i, but setting f[z,i] :=. just after calculating a root of f[z,i] complains that the assignment did not exist in the first place, and the memory is still used.

It's quite puzzling really, as f is the only remaining thing I can imagine taking up memory, but defining f in the inner Do loop and clearing it each time after a root is calculated does not solve the problem.

Shellbark answered 28/10, 2009 at 17:12 Comment(11)
Have you tried running this algorithm with smaller, or even individual d values?Malek
I just added a paragraph describing what happens for smaller d.Shellbark
What happens when you run the program without opening the stream, doing the writes, and closing the stream (i.e., without any I/O)?Harslet
I doubt I'm going to be able to reproduce this, as I'm using Mathematica 6.0 - none of the multi-kernel parallel support you have. However, I do know Mathematica has a habit of keeping a lot of things in memory you'd think would be gone. Have you tried anything like putting a Module between ParallelDo and Do, making {root, a, b, stm} local?Puffball
Interesting. I tried running it with 2 kernels (without the IO stuff), and the memory use blew up pretty bad until I got bored and aborted it.Harslet
The same thing happens if you remove the Parallel stuff, only the memory creeps us slower as only one kernel is doing it. In the end, seeing as it seems to only take up more memory in the first run of the j Do loop, it ends up taking only 1/4 as much memory.Shellbark
About using the module function, I'm not sure what I should write. Just putting the module in the paralleldo loop makes it complain that it isn't parallelizable. But when I do use it with a single kernel, it doesn't make a difference.Shellbark
I found that Module doesn't make a difference even in the parallel case. However, for future reference, you'd want to put the Module inside the inner Do.Harslet
I made the code clearer - the only place the memory use can come from is usage of the N[Root[]] function, but I have no idea how to fix it. Putting Clear[root,a,b] just after it does not solve the issue.Shellbark
@Pillsy: the reason I suggested placing it outside was that Module itself takes nontrivial time to execute, so it's good not to execute it too many times. I thought I'd picked the correct middle ground based on the OP's description of the memory behavior.Puffball
You may be interested in this post :DManor
H
13

Ouch, this is a nasty one.

What's going on is that N will do caching of results in order to speed up future calculations if you need them again. Sometimes this is absolutely what you want, but sometimes it just breaks the world. Fortunately, you do have some options. One is to use the ClearSystemCache command, which does just what it said on the tin. After I ran your un-parallelized loop for a little while (before getting bored and aborting the calculation), MemoryInUse reported ~160 MiB in use. Using ClearSystemCache got that down to about 14 MiB.

One thing you should look at doing, instead of calling ClearSystemCache programmatically, is to use SetSystemOptions to change the caching behavior. You should take a look at SystemOptions["CacheOptions"] to see what the possibilities are.

EDIT: It's not terribly surprising that the caching causes a bigger problem for more complex expressions. It's got to be stashing copies of those expressions somewhere, and more complex expressions require more memory.

Harslet answered 28/10, 2009 at 22:59 Comment(3)
Hmm, I'm having trouble replicating your results. At the moment, two problems occur: in the unparallelized version, when I call ClearSystemCache, MemoryInUse does report that the memory in use has gone back down, but the task manager shows the kernel still using as much memory. Secondly, in parallelized mode, I cannot find the option to clear the cache of the individual kernels. But you seemed to have found the precise cause, now it's more a matter of finding how to treat it.Shellbark
Messing with the CacheOptions did not prove fruitful either - I set everything to false and max byte sizes to 0 and it made no difference (to the unparallelized version and so no difference to the parallelized version either).Shellbark
Ok, adding ClearSystemCache does in fact work; for some reason it didn't work the first time, but now it does work. It even works in the parallel version. Thanks!Shellbark

© 2022 - 2024 — McMap. All rights reserved.