To understand what is going on here, let's consider what foo1 = foo2 + foo3
actually does.
- First it evaluates
foo2 + foo3
. To do this it will allocate a new temporary array to hold the output
- Then it will bind the name
foo1
to this new temporary array, undoing all effort you put in to pre-allocate the output array.
In short, you see that memory usage is about that of the resultant array because the routine is indeed allocating new memory for an array of that size.
Here are some alternatives:
- write a loop
- use
broadcast!
- We could try do do
copy!(foo1, foo2+foo3)
and then the array you pre-allocated will be filled, but it will still allocate the temporary (see below)
- The original version posted here
Here's some code for those 4 cases
julia> function with_loop!(foo1, foo2, foo3)
for i in eachindex(foo2)
foo1[i] = foo2[i] + foo3[i]
end
end
julia> function with_broadcast!(foo1, foo2, foo3)
broadcast!(+, foo1, foo2, foo3)
end
julia> function with_copy!(foo1, foo2, foo3)
copy!(foo1, foo2+foo3)
end
julia> function original(foo1, foo2, foo3)
foo1 = foo2 + foo3
end
Now let's time these functions
julia> for f in [:with_broadcast!, :with_loop!, :with_copy!, :original]
@eval $f(foo1, foo2, foo3) # compile
println("timing $f")
@eval @time $f(foo1, foo2, foo3)
end
timing with_broadcast!
0.001787 seconds (5 allocations: 192 bytes)
timing with_loop!
0.001783 seconds (4 allocations: 160 bytes)
timing with_copy!
0.003604 seconds (9 allocations: 7.630 MB)
timing original
0.002702 seconds (9 allocations: 7.630 MB, 97.91% gc time)
You can see that with_loop!
and broadcast!
do about the same and both are much faster and more efficient than the others. with_copy!
and original
are both slower and use more memory.
In general, to do inplace operations I'd recommend starting out by writing a loop
sizeof(foo1)/(1024*1024)
does in fact round to7.630
. – Ally