Does passing a variable with a large amount of data cost a lot of memory and time in Mathematica?
Asked Answered
B

2

15

I am coding up an algorithm for constructing a suffix tree in Mathematica based on Ukkonen's algorithm.

The question I have is, will passing my entire tree structure (which I have stored in a list) to a function to search through, cost my program a lot of memory and time since I have to use some of the functions multiple times in the algorithm?

For example, I have a function that searches for the children of a specific node, and I use the Select function to search the entire tree.

getChildren[parentID_] := Select[tree, #[[3]] == parentID &];

However I need to access the tree, so is it reasonable to pass the entire tree structure to the function? Since it doesn't seem that there is a way to make a variable global to the entire notebook. Or is there some alternate way to get around this?

Berl answered 30/11, 2011 at 10:39 Comment(1)
do you mean by "there is no way to make a variable global to the entire notebook"? if you define tree=5 then tree is 5 everywhere in the Global context (which is the default). it's global by default, unless you're after some other behaviour.Factory
G
25

No, it does not cost extra memory to pass expressions. As is usual in functional languages, Mathematica objects are immutable: they cannot be modified, instead a new object is created when you transform them using some function. This also means that if you don't transform them, they're not copied, no matter how much you pass them around between functions.


From a user perspective, Mathematica expressions are trees, but I believe that internally they're stored as directed acyclic graphs, i.e. the same subexpression may be stored only once in memory, regardless of how many times it appears in the full expression (see e.g. the doc page of Share[]).

Here's an example to illustrate:

First, make sure In/Out don't take up extra memory:

In[1]:= $HistoryLength = 0;

Check memory usage:

In[2]:= MemoryInUse[]
Out[2]= 13421756

Let's make an expression that takes up a noticeable amount of memory:

In[3]:= s = f@Range[1000000];

In[4]:= MemoryInUse[]
Out[4]= 17430260

Now repeat this expression a hundred times ...

In[5]:= t = ConstantArray[s, 100];

... and notice that memory usage barely increases:

In[6]:= MemoryInUse[]
Out[6]= 18264676

ByeCount[] is misleading because it doesn't report the actual physical memory used, but the memory that would be used if common subexpressions weren't allowed to share the same memory:

In[7]:= ByteCount[t]
Out[7]= 400018040

An interesting point to note: if you remove f[...] from s, and make both s and t a plain numerical array, then this memory sharing will not happen, and memory usage will jump to ~400 MB.


Whether you make tree a global variable or an argument of getChildren, it will not make a difference in memory usage.

Genetic answered 30/11, 2011 at 10:58 Comment(6)
Excellent answer! To further confirm the picture you described, one can do an assignment to some element in t, for example like so : t[[1, 1, 1]] = 2; , and notice an immediate jump in memory consumption corresponding to the size of a single s instance. The reason for what you observed with the numerical array is subtle: this is only because Range produces packed arrays, and you used ConstantArray (and not e.g. Table). The point is, ConstantArray will produce a packed array out of a packed array, and you can not share memory in a large packed array, since it is ...Fencing
... based on direct C memory layout. Should you have used either ur = Developer`FromPackedArray[Range[1000000]]; t = ConstantArray[ur, {100}] or r = Range[1000000];t = Table[r, {100}];, and you would have observed the same memory sharing, since the result is not packed (meaning that there are intermediate pointers, and sharing is possible - or at least this is my picture of this at the moment).Fencing
I would slightly change the statement on immutability to something like: "they can not be modified, unless they are stored in a variable" - Mathematica is not a pure functional language, and mutability is possible.Fencing
@LeonidShifrin @Genetic My test shows ConstantArray also eats same amount of memory in the first example. See i.sstatic.net/Qakfh.png How to explain this?Paramagnetic
@Paramagnetic I don't see the problem. In the first case, you construct a packed array, and wrap it into f. In the second case, you construct a constant array of references to f[first-array]. Perhaps, both pointers and integers take 8 bytes each, which is why the size of the memory use increase is the same. I don't have the time to dig deeper now.Fencing
@LeonidShifrin Oh, I am so stupid. I make a silly typos in ConstantArray, it should be 100, I write it 1000000. Thank you so much for answering ;-)Paramagnetic
M
4

Further to Szabolcs answers, if you do need to modify the data you may find this question on pass-by-reference useful:

simple question on passing data between functions

Maness answered 30/11, 2011 at 11:16 Comment(8)
Since you brought up this topic, let me mention that while the accepted answer to that question does work, I would generally advise against using Unevaluated in that manner, from the program design perspective: one should design functions self-contained, while there the function will only work if the user does not forget to wrap Unevaluated around the argument, and further, there is no way to catch this omission for the function itself. IMO, Hold-attributes are strongly preferred to Unevaluated in cases like that.Fencing
@ Leonid - I changed the link to refer to your answer. (That was my first intention anyway.)Maness
Thanks, but my answer there is also not a general recommendation - the question there was constrained by the restrictions of CDF where explicit Hold attributes can not be attached to a symbol. Apparently, no one yet here on SO asked a simple question titled something like "Pass-by-reference in Mathematica" (or at least I am anaware of it), while having an SO discussion for such a topic seems very desirable.Fencing
@LeonidShifrin, I would like to ask if you have an updated version of your struct/object method shown in your post here groups.google.com/group/comp.soft-sys.math.mathematica/… at the end of the thread. I like it, and thought to check if you have an updated version to use in a demo I am writing. If nothing changed, I will use the above code you posted. thanksNanna
@LeonidShifrin, Well, never mind. Your package posted in the link above works so well, and I really liked using it, but when I tested it inside a demonstration, many symbols used are not allowed in CDF. So unfortunately, I can't use it :( The symbols not allowed are----> "edit the notebook to remove the following illegal symbols: ClearAll, DownValues, Remove, SetAttributes, Symbol, Unprotect" ---> So I tried to edit your code to remove these symbols, but this broke things. It will be great if it possible to make one like this without these symbols, but that might not be possible.Nanna
@Nanna It might be possible. I will have a look, whenever I get more time, hopefully soon.Fencing
@Nanna Do you, or anybody else, have a complete list of "illegal" symbols in CDF ? I found it kind of frustrating several times to readjust code for CDF to work and such a list might help. Or do I just not see in the "Documentation center" ?Kingcraft
Here is the MathGroup link to Leonid's struct/object method post, in case the googlegroups link doesn't work.Maness

© 2022 - 2024 — McMap. All rights reserved.