Let's say I have this two snippet of code in python :
1 --------------------------
import numpy as np
x = np.array([1,2,3,4])
y = x
x = x + np.array([1,1,1,1])
print y
2 --------------------------
import numpy as np
x = np.array([1,2,3,4])
y = x
x += np.array([1,1,1,1])
print y
I thought the result of y
will be the same in both examples since y
point out to x
and x
become (2,3,4,5)
, BUT it wasn't
The results were (1,2,3,4) for 1
and (2,3,4,5) for 2
.
After some research I find out that in first example
#-First example---------------------------------------
x = np.array([1,2,3,4]) # create x --> [1,2,3,4]
y = x # made y point to x
# unril now we have x --> [1,2,3,4]
# |
# y
x = x + np.array([1,1,1,1])
# however this operation **create a new array** [2,3,4,5]
# and made x point to it instead of the first one
# so we have y --> [1,2,3,4] and x --> [2,3,4,5]
#-Second example--------------------------------------
x = np.array([1,2,3,4]) # create x --> [1,2,3,4]
y = x # made y point to x
# unril now the same x --> [1,2,3,4]
# |
# y
x += np.array([1,1,1,1])
# this operation **Modify the existing array**
# so the result will be
# unril now the same x --> [2,3,4,5]
# |
# y
You can find out more about this behaviors (not only for this example) in this link In-place algorithm
My question is : Being aware of this behavior why should I use in-place algorithm in term of performance? (time of excution faster? less memory alocation?..)
EDIT : Clarification
The example of (+, =+) was just to explain simply the in-place algorithm to the one who don't know.. but the question was in general the use of in-place algorithm not only in this case..
As another more complex example: loading a CSV file (just 10 Million rows) in a variable then sorting the result, is the idea of in-place algorithm is to produce an output in the same memory space that contains the input by successively transforming that data until the output is produced? - This avoids the need to use twice the storage - one area for the input and an equal-sized area for the output ( Using the minimum amount of RAM, hard disk ... )
numpy
will have a free block of memory so there won't be a huge performance difference, but I could be wrong. – Edirne+
and+=
for numpy, what does performance matter? – Resign+=
is intended to modify the underlying object, it can be made more efficient (by only modifying parts of the original object, rather than modifying a copy). – Garibaldix
andy
point to the same array, so if you modifyx
, you also modifyy
. To me, this question seems to be a mix of "difference of+
and+=
on lists" and "making a copy of a list". – Resign+=
if you modifyx
byx = x + som
the programe alocate a new place in the memory and store the result there and makex
point to the new result whiley
still pointing to the old one – Detergent+
vs.+=
for lists. The point is, with+
, you do not modifyx
; you create a new array and assign that tox
. – Resign