Subtract two xarrays while keeping all dimensions
Asked Answered
P

1

5

this might be the most basic question out there but I just could not find a solution.

I have two different xarrays containing wind data. Both xarrays have the dimensions (time: 60, plev: 19, lat: 90). I now need to take the difference between the two xarrays over all dimensions to find the anomaly between the two scenarios.

I think that the xarray.DataArray.diff function is only for calculating the difference along an axis of one xarray (and not to calculate the difference between two xarrays).

So, I tried using simply

diff = wind1_xarray - wind2_xarray

as well as

diff = (wind1_xarray - wind2_xarray).compute() 

However, both methods give me an xarray with dimensions (time: 60, plev: 0, lat: 90). Why do I loose the pressure levels when calculating the difference?

And how can I calculate the difference between two xarrays over all dimensions without loosing one dimension?

Thanks everyone

Postpositive answered 6/11, 2021 at 17:49 Comment(0)
D
10

The quick answer is that you're doing it right, but your dimensions are not aligned. xarray IS designed to subtract entire arrays, but the coordinate labels must be aligned exactly. You likely have a disagreement between elements of your plev coordinate, which you can check with xr.align:

xr.align(wind1_array, wind2_array, join='exact')

See the xarray docs on computation: automatic alignment for more information.

Detailed example

The biggest difference between xarray and numpy (assuming you're familiar with math using numpy) is that xarray relies on the coordinate labels along each dimension to align arrays before any broadcasting operations, not just the shape.

As an example, let's consider two very simple arrays - one counting up from 0 to 19 and the other a block of ones, both reshaped to (4, 5). Subtracting these from each other in numpy is straightforward because they're the same shape:

In [15]: arr1 = np.arange(20).reshape((4, 5))

In [16]: arr2 = np.ones(shape=(4, 5))

In [17]: arr1 - arr2
Out[17]:
array([[-1.,  0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12., 13.],
       [14., 15., 16., 17., 18.]])

The xarray equivalent is also straightforward, but we must introduce dimension names and coordinates. Let's assume your pressure levels are in increments of 10 hPa decreasing toward STP, and latitudes are also in increments of 10 from 20 to 60:

In [18]: pressures = np.array([71.325, 81.325, 91.325, 101.325])

In [19]: lats = np.array([20, 30, 40, 50, 60])

In [20]: da1 = xr.DataArray(arr1, dims=['plev', 'lat'], coords=[pressures, lats])

In [21]: da2 = xr.DataArray(arr2, dims=['plev', 'lat'], coords=[pressures, lats])

In [22]: da2
Out[22]:
<xarray.DataArray (plev: 4, lat: 5)>
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])
Coordinates:
  * plev     (plev) float64 71.33 81.33 91.33 101.3
  * lat      (lat) int64 20 30 40 50 60

In [23]: da1
Out[23]:
<xarray.DataArray (plev: 4, lat: 5)>
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])
Coordinates:
  * plev     (plev) float64 71.33 81.33 91.33 101.3
  * lat      (lat) int64 20 30 40 50 60

These arrays are aligned, so subtracting them is straightforward:

In [24]: da1 - da2
Out[24]:
<xarray.DataArray (plev: 4, lat: 5)>
array([[-1.,  0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12., 13.],
       [14., 15., 16., 17., 18.]])
Coordinates:
  * plev     (plev) float64 71.33 81.33 91.33 101.3
  * lat      (lat) int64 20 30 40 50 60

But because xarray relies on these coordinates being aligned exactly, relying on float coordinates can be tricky. If we introduce even a small error to the pressure level dimension, the arrays are not aligned and we see a result similar to yours:

In [25]: da2 = xr.DataArray(arr2, dims=['plev', 'lat'], coords=[pressures + 1e-8, lats])

In [26]: da1 - da2
Out[26]:
<xarray.DataArray (plev: 0, lat: 5)>
array([], shape=(0, 5), dtype=float64)
Coordinates:
  * plev     (plev) float64
  * lat      (lat) int64 20 30 40 50 60

This type of mis-alignment can happen for all sorts of reasons, including round-tripping data through storage, where changes in encoding can cause tiny numerical errors which show up as mis-aligned data.

You can check whether this is the problem with xr.align using the join='exact' argument:

In [27]: xr.align(da1, da2, join='exact')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-29-612460e52308> in <module>
----> 1 xr.align(da1, da2, join='exact')

~/miniconda3/envs/myenv/lib/python3.9/site-packages/xarray/core/alignment.py in align(join, copy, indexes, exclude, fill_value, *objects)
    320             ):
    321                 if join == "exact":
--> 322                     raise ValueError(f"indexes along dimension {dim!r} are not equal")
    323                 joiner = _get_joiner(join, type(matching_indexes[0]))
    324                 index = joiner(matching_indexes)

ValueError: indexes along dimension 'plev' are not equal

To get around this problem, you might try rounding your coordinates to a known tolerance of the coordinate:

In [32]: da2['plev'] = np.round(da2['plev'], 3)

In [33]: da1 - da2
Out[33]:
<xarray.DataArray (plev: 4, lat: 5)>
array([[-1.,  0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12., 13.],
       [14., 15., 16., 17., 18.]])
Coordinates:
  * plev     (plev) float64 71.33 81.33 91.33 101.3
  * lat      (lat) int64 20 30 40 50 60

Alternatively, you could set a positional/integer coordinate, with the actual pressure level as a non-indexing coordinate:

In [42]: da1
Out[42]:
<xarray.DataArray (plev_ind: 4, lat: 5)>
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])
Coordinates:
    plev      (plev_ind) float64 71.33 81.33 91.33 101.3
  * lat       (lat) int64 20 30 40 50 60
  * plev_ind  (plev_ind) int64 71325 81325 91325 101325

In [43]: da2
Out[43]:
<xarray.DataArray (plev_ind: 4, lat: 5)>
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])
Coordinates:
    plev      (plev_ind) float64 71.33 81.33 91.33 101.3
  * lat       (lat) int64 20 30 40 50 60
  * plev_ind  (plev_ind) int64 71325 81325 91325 101325

In [44]: da1 - da2
Out[44]:
<xarray.DataArray (plev_ind: 4, lat: 5)>
array([[-1.,  0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12., 13.],
       [14., 15., 16., 17., 18.]])
Coordinates:
  * lat       (lat) int64 20 30 40 50 60
  * plev_ind  (plev_ind) int64 71325 81325 91325 101325
Dubonnet answered 6/11, 2021 at 18:57 Comment(1)
Omg thanks so so much!!! I would have never figured this out myself. It was exactly as you thought that the pressure levels were not aligned (just a tiny difference in one of the decimal places introduced by python, the original pressure levels did not have decimal places). I used da2['plev'] = np.round(da2['plev'], 3) and that worked perfectlyPostpositive

© 2022 - 2024 — McMap. All rights reserved.