I have the following code:
r = numpy.zeros(shape = (width, height, 9))
It creates a width x height x 9
matrix filled with zeros. Instead, I'd like to know if there's a function or way to initialize them instead to NaN
s in an easy way.
I have the following code:
r = numpy.zeros(shape = (width, height, 9))
It creates a width x height x 9
matrix filled with zeros. Instead, I'd like to know if there's a function or way to initialize them instead to NaN
s in an easy way.
You rarely need loops for vector operations in numpy. You can create an uninitialized array and assign to all entries at once:
>>> a = numpy.empty((3,3,))
>>> a[:] = numpy.nan
>>> a
array([[ NaN, NaN, NaN],
[ NaN, NaN, NaN],
[ NaN, NaN, NaN]])
I have timed the alternatives a[:] = numpy.nan
here and a.fill(numpy.nan)
as posted by Blaenk:
$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a.fill(np.nan)"
10000 loops, best of 3: 54.3 usec per loop
$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a[:] = np.nan"
10000 loops, best of 3: 88.8 usec per loop
The timings show a preference for ndarray.fill(..)
as the faster alternative. OTOH, I like numpy's convenience implementation where you can assign values to whole slices at the time, the code's intention is very clear.
Note that ndarray.fill
performs its operation in-place, so numpy.empty((3,3,)).fill(numpy.nan)
will instead return None
.
a = numpy.empty((3, 3,)) * numpy.nan
. It timed faster than fill
but slower than the assignment method, but it is a oneliner!! –
Transmogrify .fill()
method, but the difference in speeds reduces to practically nothing as the arrays get larger. –
Milligan np.empty([2, 5]).fill(np.nan)
? This returns None
, and I'm not sure why... –
Coherence np.empty([2, 5])
creates an array, then fill()
modifies that array in-place, but does not return a copy or a reference. If you want to call np.empty(2, 5)
by a name ("assign is to a variable"), you have to do so before you do in-place operations on it. Same kinda thing happens if you do [1, 2, 3].insert(1, 4)
. The list is created and a 4 is inserted, but it is impossible to get a reference to the list (and thus it can be assumed to have been garbage collected). On immutable data like strings, a copy is returned, because you can't operate in-place. Pandas can do both. –
Ethnogeny np.full
feels a lot simpler than this –
Dulaney (a := np.empty((3,3))).fill(np.nan)
is a performent one-liner solution. The expression still returns None, so you can not chain it further, but I find it useful anyway. –
Confederate Another option is to use numpy.full
, an option available in NumPy 1.8+
a = np.full([height, width, 9], np.nan)
This is pretty flexible and you can fill it with any other number that you want.
full
is meant for. np.empy((x,y))*np.nan
is a good runner-up (and compatibility for old versions of numpy). –
Turmoil fill
python -mtimeit "import numpy as np; a = np.empty((100,100));" "a.fill(np.nan)" 100000 loops, best of 3: 13.3 usec per loop python -mtimeit "import numpy as np; a = np.full((100,100), np.nan);" 100000 loops, best of 3: 18.5 usec per loop
–
Whitesmith python -mtimeit "import numpy as np; a = np.empty((1000,1000)); a.fill(np.nan)" 1000 loops, best of 3: 381 usec per loop $ python -mtimeit "import numpy as np; a = np.full((1000,1000), np.nan);" 1000 loops, best of 3: 383 usec per loop
–
Picrotoxin 9
for in [height, width, 9]
? –
Lisabethlisan 9
is from OP; the question had a 3D array of shape (height, width, 9)
–
Chili I compared the suggested alternatives for speed and found that, for large enough vectors/matrices to fill, all alternatives except val * ones
and array(n * [val])
are equally fast.
Code to reproduce the plot:
import numpy
import perfplot
val = 42.0
def fill(n):
a = numpy.empty(n)
a.fill(val)
return a
def colon(n):
a = numpy.empty(n)
a[:] = val
return a
def full(n):
return numpy.full(n, val)
def ones_times(n):
return val * numpy.ones(n)
def list(n):
return numpy.array(n * [val])
b = perfplot.bench(
setup=lambda n: n,
kernels=[fill, colon, full, ones_times, list],
n_range=[2 ** k for k in range(20)],
xlabel="len(a)",
)
b.save("out.png")
numpy.full(n, val)
is slower than a = numpy.empty(n) .. a.fill(val)
since it does the same thing internally –
Bertrand Are you familiar with numpy.nan
?
You can create your own method such as:
def nans(shape, dtype=float):
a = numpy.empty(shape, dtype)
a.fill(numpy.nan)
return a
Then
nans([3,4])
would output
array([[ NaN, NaN, NaN, NaN],
[ NaN, NaN, NaN, NaN],
[ NaN, NaN, NaN, NaN]])
I found this code in a mailing list thread.
You can always use multiplication if you don't immediately recall the .empty
or .full
methods:
>>> np.nan * np.ones(shape=(3,2))
array([[ nan, nan],
[ nan, nan],
[ nan, nan]])
Of course it works with any other numerical value as well:
>>> 42 * np.ones(shape=(3,2))
array([[ 42, 42],
[ 42, 42],
[ 42, 42]])
But the @u0b34a0f6ae's accepted answer is 3x faster (CPU cycles, not brain cycles to remember numpy syntax ;):
$ python -mtimeit "import numpy as np; X = np.empty((100,100));" "X[:] = np.nan;"
100000 loops, best of 3: 8.9 usec per loop
(predict)laneh@predict:~/src/predict/predict/webapp$ master
$ python -mtimeit "import numpy as np; X = np.ones((100,100));" "X *= np.nan;"
10000 loops, best of 3: 24.9 usec per loop
Yet another possibility not yet mentioned here is to use NumPy tile:
a = numpy.tile(numpy.nan, (3, 3))
Also gives
array([[ NaN, NaN, NaN],
[ NaN, NaN, NaN],
[ NaN, NaN, NaN]])
update:
I did a speed comparison, and it's not very fast :/ It's slower than the ones_times
by a decimal.
Another alternative is numpy.broadcast_to(val,n)
which returns in constant time regardless of the size and is also the most memory efficient (it returns a view of the repeated element). The caveat is that the returned value is read-only.
Below is a comparison of the performances of all the other methods that have been proposed using the same benchmark as in Nico Schlömer's answer.
As said, numpy.empty() is the way to go. However, for objects, fill() might not do exactly what you think it does:
In[36]: a = numpy.empty(5,dtype=object)
In[37]: a.fill([])
In[38]: a
Out[38]: array([[], [], [], [], []], dtype=object)
In[39]: a[0].append(4)
In[40]: a
Out[40]: array([[4], [4], [4], [4], [4]], dtype=object)
One way around can be e.g.:
In[41]: a = numpy.empty(5,dtype=object)
In[42]: a[:]= [ [] for x in range(5)]
In[43]: a[0].append(4)
In[44]: a
Out[44]: array([[4], [], [], [], []], dtype=object)
Just a warning that initializing with np.empty()
without subsequently editing the values can lead to (memory allocation?) problems:
arr1 = np.empty(96)
arr2 = np.empty(96)
print(arr1)
print(arr2)
# [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 1. 1.
# 1. 1. 2. 2. 2. 2. nan nan nan nan nan nan nan nan 0. 0. 0. 0.
# 0. 0. 0. 0. nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan]
#
# [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 1. 1.
# 1. 1. 2. 2. 2. 2. nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan]
The floats initialized in the array are used somewhere else in my script but are not associated with variables arr1
or arr2
at all. Spooky.
Answer from user @JHBonarius solved this problem:
arr = np.tile(np.nan, 96)
print(arr)
# [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan]
>>> width = 2
>>> height = 3
>>> r = np.full((width, height, 9), np.nan)
>>> print(r)
array([[[nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan]],
[[nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan]]])
>>> r.shape
(2, 3, 9)
9
for in [height, width, 9]
? –
Lisabethlisan Pardon my tardiness, but here is the fastest solution for large arrays, iff single-precision (f4 float32) is all you need. And yes, np.nan
works as well.
def full_single_prec(n):
return numpy.full(n, val, dtype='f4')
© 2022 - 2024 — McMap. All rights reserved.
np.nan
goes wrong when converted to int. – Jocundity