It looks like you're looking for xarray
. Indices in numpy are purely positional. You can't have an index in numpy be a timestamp, because the first index is always 0, and the last index is always len(axis) - 1
.
xarray uses n-dimensional arrays as a computational engine, but adds the concept of labeled indexing from pandas. It's a NumFOCUS-supported project with a lot of users and growing tie-ins to pandas, numpy, and dask (for distributed processing). You can easily create an ND-Array with e.g. datetime coordinates (dimension labels) and select using these labels. You can also use the sparse
package's COO arrays as a backend if desired.
See the quickstart for an introduction.
For example, you can create an array from a numpy NDArray, but add dimension names and coordinate labels:
import xarray as xr, numpy as np, pandas as pd
da = xr.DataArray(
np.random.random(size=(10, 10, 100)),
dims=['x', 'y', 'time'],
coords=[
range(10),
range(-100, 0, 10),
pd.date_range('2022-06-23 18:08', periods=100, freq='s'),
],
)
Here's what this looks like displayed:
In [3]: da
Out[3]:
<xarray.DataArray (x: 10, y: 10, time: 100)>
array([[[5.20920842e-01, 4.69121072e-01, 6.40222454e-01, ...,
2.99971293e-01, 2.62265561e-01, 6.35366406e-01],
...,
[2.67650196e-01, 1.83472873e-01, 9.28958673e-01, ...,
2.54365478e-01, 5.31364961e-01, 7.64313509e-01]],
...
[[4.36503680e-01, 6.04280469e-01, 3.74281880e-01, ...,
9.41795201e-03, 2.45035315e-01, 4.36213072e-01],
...,
[2.70554857e-01, 9.81791362e-01, 3.67033886e-01, ...,
2.37171168e-01, 3.92829137e-01, 1.18888502e-02]]])
Coordinates:
* x (x) int64 0 1 2 3 4 5 6 7 8 9
* y (y) int64 -100 -90 -80 -70 -60 -50 -40 -30 -20 -10
* time (time) datetime64[ns] 2022-06-23T18:08:00 ... 2022-06-23T18:09:39
The underlying array is still numpy:
In [4]: type(da.data)
Out[4]: numpy.ndarray
You can select along dimensions positionally, or by label using .sel
:
In [5]: da.sel(time='2022-06-23T18:09:01')
Out[5]:
<xarray.DataArray (x: 10, y: 10)>
array([[0.61802968, 0.44798696, 0.53146839, 0.54672015, 0.52251633,
0.69215547, 0.84386726, 0.72421072, 0.87467204, 0.87845358],
[0.22257334, 0.32035713, 0.08175992, 0.34816822, 0.84258207,
0.80708575, 0.02339722, 0.1904887 , 0.77412369, 0.34198665],
[0.4987155 , 0.05057836, 0.11611118, 0.95652761, 0.88992791,
0.15960549, 0.31591357, 0.77504342, 0.04418024, 0.02722908],
[0.76613849, 0.88007545, 0.27904722, 0.56225594, 0.39773015,
0.23494531, 0.54437166, 0.41985857, 0.92803277, 0.63992328],
[0.00981116, 0.2688392 , 0.17421749, 0.45761431, 0.74987955,
0.8115907 , 0.42623655, 0.9660985 , 0.25014544, 0.47767839],
[0.21176705, 0.17295334, 0.25520267, 0.17743549, 0.10468529,
0.48232753, 0.55139512, 0.9658701 , 0.52430646, 0.99446656],
[0.83707974, 0.07546811, 0.70503445, 0.62984982, 0.5956393 ,
0.93147836, 0.97454177, 0.92595764, 0.4889221 , 0.59362206],
[0.04210777, 0.56803518, 0.78362288, 0.54106628, 0.09178342,
0.63581206, 0.03913531, 0.43868853, 0.22767441, 0.86995461],
[0.88047 , 0.86284775, 0.26553173, 0.06123448, 0.55392798,
0.44922685, 0.18933487, 0.16720496, 0.40440954, 0.79741338],
[0.22714674, 0.76756767, 0.08131078, 0.64319224, 0.39983711,
0.792 , 0.32000998, 0.42772083, 0.19313205, 0.35174807]])
Coordinates:
* x (x) int64 0 1 2 3 4 5 6 7 8 9
* y (y) int64 -100 -90 -80 -70 -60 -50 -40 -30 -20 -10
time datetime64[ns] 2022-06-23T18:09:01
Alignment in xarray is done by dimension name rather than axis order, so there's no reason to have an array with shape (1, 1, 1, 1, 1000). Instead, just ensure that dimension names are consistent across your arrays, and two arrays with shared dimension names will be broadcast against each other correctly. See the docs on computation: automatic alignment for more info.
tf.constant
? – PhotocathodeA
is impossible withnumpy
. YourB
doesn't make sense.B[:,:,...1000]
meansB
indexed on the last dimension with 1000. – NankeenA
is impossible; my goal is to see if there are solutions "close" to how ndarray work in the philosophy.B
is an example of 5D-array for which the last dimension can be (big) numbers. Of course forB
to make sense, we need a sparse structure, if not, it would uselessly require petabytes of data :) – Collativeawkward
package, which does deal with jagged arrays, but not with labeling, and it doesn't fully conform to the numpy array protocol, so it's not compatible with xarray or any of the other options you're looking at. ¯_(ツ)_/¯ – Norman.tonumpy()
to get a standardndarray
in this case: in the case I need to average over 1 labeled-axis and fix a value for another labeled-axis, the remaining dimensions will be fixed and not ragged, so I can probably convert to a normal ndarray in this case) – Collativexarray
, but not sure if it supports sparse/masked arrays. – Dullard