The quick answer is that you're doing it right, but your dimensions are not aligned. xarray
IS designed to subtract entire arrays, but the coordinate labels must be aligned exactly. You likely have a disagreement between elements of your plev coordinate, which you can check with xr.align
:
xr.align(wind1_array, wind2_array, join='exact')
See the xarray docs on computation: automatic alignment for more information.
Detailed example
The biggest difference between xarray and numpy (assuming you're familiar with math using numpy) is that xarray relies on the coordinate labels along each dimension to align arrays before any broadcasting operations, not just the shape.
As an example, let's consider two very simple arrays - one counting up from 0 to 19 and the other a block of ones, both reshaped to (4, 5)
. Subtracting these from each other in numpy is straightforward because they're the same shape:
In [15]: arr1 = np.arange(20).reshape((4, 5))
In [16]: arr2 = np.ones(shape=(4, 5))
In [17]: arr1 - arr2
Out[17]:
array([[-1., 0., 1., 2., 3.],
[ 4., 5., 6., 7., 8.],
[ 9., 10., 11., 12., 13.],
[14., 15., 16., 17., 18.]])
The xarray equivalent is also straightforward, but we must introduce dimension names and coordinates. Let's assume your pressure levels are in increments of 10 hPa decreasing toward STP, and latitudes are also in increments of 10 from 20 to 60:
In [18]: pressures = np.array([71.325, 81.325, 91.325, 101.325])
In [19]: lats = np.array([20, 30, 40, 50, 60])
In [20]: da1 = xr.DataArray(arr1, dims=['plev', 'lat'], coords=[pressures, lats])
In [21]: da2 = xr.DataArray(arr2, dims=['plev', 'lat'], coords=[pressures, lats])
In [22]: da2
Out[22]:
<xarray.DataArray (plev: 4, lat: 5)>
array([[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]])
Coordinates:
* plev (plev) float64 71.33 81.33 91.33 101.3
* lat (lat) int64 20 30 40 50 60
In [23]: da1
Out[23]:
<xarray.DataArray (plev: 4, lat: 5)>
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
Coordinates:
* plev (plev) float64 71.33 81.33 91.33 101.3
* lat (lat) int64 20 30 40 50 60
These arrays are aligned, so subtracting them is straightforward:
In [24]: da1 - da2
Out[24]:
<xarray.DataArray (plev: 4, lat: 5)>
array([[-1., 0., 1., 2., 3.],
[ 4., 5., 6., 7., 8.],
[ 9., 10., 11., 12., 13.],
[14., 15., 16., 17., 18.]])
Coordinates:
* plev (plev) float64 71.33 81.33 91.33 101.3
* lat (lat) int64 20 30 40 50 60
But because xarray relies on these coordinates being aligned exactly, relying on float coordinates can be tricky. If we introduce even a small error to the pressure level dimension, the arrays are not aligned and we see a result similar to yours:
In [25]: da2 = xr.DataArray(arr2, dims=['plev', 'lat'], coords=[pressures + 1e-8, lats])
In [26]: da1 - da2
Out[26]:
<xarray.DataArray (plev: 0, lat: 5)>
array([], shape=(0, 5), dtype=float64)
Coordinates:
* plev (plev) float64
* lat (lat) int64 20 30 40 50 60
This type of mis-alignment can happen for all sorts of reasons, including round-tripping data through storage, where changes in encoding can cause tiny numerical errors which show up as mis-aligned data.
You can check whether this is the problem with xr.align
using the join='exact'
argument:
In [27]: xr.align(da1, da2, join='exact')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-29-612460e52308> in <module>
----> 1 xr.align(da1, da2, join='exact')
~/miniconda3/envs/myenv/lib/python3.9/site-packages/xarray/core/alignment.py in align(join, copy, indexes, exclude, fill_value, *objects)
320 ):
321 if join == "exact":
--> 322 raise ValueError(f"indexes along dimension {dim!r} are not equal")
323 joiner = _get_joiner(join, type(matching_indexes[0]))
324 index = joiner(matching_indexes)
ValueError: indexes along dimension 'plev' are not equal
To get around this problem, you might try rounding your coordinates to a known tolerance of the coordinate:
In [32]: da2['plev'] = np.round(da2['plev'], 3)
In [33]: da1 - da2
Out[33]:
<xarray.DataArray (plev: 4, lat: 5)>
array([[-1., 0., 1., 2., 3.],
[ 4., 5., 6., 7., 8.],
[ 9., 10., 11., 12., 13.],
[14., 15., 16., 17., 18.]])
Coordinates:
* plev (plev) float64 71.33 81.33 91.33 101.3
* lat (lat) int64 20 30 40 50 60
Alternatively, you could set a positional/integer coordinate, with the actual pressure level as a non-indexing coordinate:
In [42]: da1
Out[42]:
<xarray.DataArray (plev_ind: 4, lat: 5)>
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
Coordinates:
plev (plev_ind) float64 71.33 81.33 91.33 101.3
* lat (lat) int64 20 30 40 50 60
* plev_ind (plev_ind) int64 71325 81325 91325 101325
In [43]: da2
Out[43]:
<xarray.DataArray (plev_ind: 4, lat: 5)>
array([[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]])
Coordinates:
plev (plev_ind) float64 71.33 81.33 91.33 101.3
* lat (lat) int64 20 30 40 50 60
* plev_ind (plev_ind) int64 71325 81325 91325 101325
In [44]: da1 - da2
Out[44]:
<xarray.DataArray (plev_ind: 4, lat: 5)>
array([[-1., 0., 1., 2., 3.],
[ 4., 5., 6., 7., 8.],
[ 9., 10., 11., 12., 13.],
[14., 15., 16., 17., 18.]])
Coordinates:
* lat (lat) int64 20 30 40 50 60
* plev_ind (plev_ind) int64 71325 81325 91325 101325
da2['plev'] = np.round(da2['plev'], 3)
and that worked perfectly – Postpositive