matplotlib.mlab.griddata very slow and returns array of nan when valid data is input
Asked Answered
R

4

5

I am trying to map an irregularly gridded dataset (raw satellite data) with associated latitudes and longitudes to a regularly gridded set of latitudes and longitudes given by basemap.makegrid(). I am using matplotlib.mlab.griddata with mpl_toolkits.natgrid installed. Below is a list of the variables being used as output by whos in ipython and some stats on the variables:

Variable   Type       Data/Info
-------------------------------
datalat    ndarray    666x1081: 719946 elems, type `float32`, 2879784 bytes (2 Mb)
datalon    ndarray    666x1081: 719946 elems, type `float32`, 2879784 bytes (2 Mb)
gridlat    ndarray    1200x1000: 1200000 elems, type `float64`, 9600000 bytes (9 Mb)
gridlon    ndarray    1200x1000: 1200000 elems, type `float64`, 9600000 bytes (9 Mb)
var        ndarray    666x1081: 719946 elems, type `float32`, 2879784 bytes (2 Mb)

In [11]: var.min()
Out[11]: -30.0

In [12]: var.max()
Out[12]: 30.0

In [13]: datalat.min()
Out[13]: 27.339874

In [14]: datalat.max()
Out[14]: 47.05302

In [15]: datalon.min()
Out[15]: -137.55658

In [16]: datalon.max()
Out[16]: -108.41629

In [17]: gridlat.min()
Out[17]: 30.394031556984299

In [18]: gridlat.max()
Out[18]: 44.237140350357713

In [19]: gridlon.min()
Out[19]: -136.17646180595321

In [20]: gridlon.max()
Out[20]: -113.82353819404671

datalat and datalon are the orignal data coordinates

gridlat and gridlon are the coordinates to interpolate to

var contains the actual data

Using these variables, when I call griddata(datalon, datalat, var, gridlon, gridlat) it has taken as long as 20 minutes to complete and returns an array of nan. From looking at the data, the latitudes and longitudes appear to be correct with the original coordinates overlapping a portion of the new area and a few data points lying outside of the new area. Does anyone have any suggestions? The nan values suggest that I'm doing something stupid...

Rock answered 15/9, 2011 at 1:3 Comment(2)
What satellite? Is it a scanner or a staring FPA? What's the ground sample distance?Baskerville
Heh, there's the fun part. I need to be capable of doing this for some 40+ different atmospheric sensors. Most of them are scaning sensors (conical, cross track, etc).Rock
B
2

It looks like the mlab.griddata routine may introduce additional constraints on your output data that may not be necessary. While the input locations may be anything, the output locations must be a regular grid - since your example is in lat/lon space, your choice of map projection may violate this (i.e. regular grid in x/y is not a regular grid in lat/lon).

You might try the interpolate.griddata routine from SciPy as an alternative - you'll need to combine your location variables into a single array, though, since the call signature is different: something like

import scipy.interpolate
data_locations = np.vstack(datalon.ravel(), datalat.ravel()).T
grid_locations = np.vstack(gridlon.ravel(), gridlat.ravel()).T
grid_data      = scipy.interpolate.griddata(data_locations, val.ravel(),
                                            grid_locations, method='nearest')

for nearest-neighbor interpolation. This gets the locations into an array with 2 columns corresponding to your 2 dimensions. You may also want to perform the interpolation in the transformed space of your map projection.

Babbitt answered 15/12, 2011 at 16:40 Comment(1)
Perfect, thanks Tim! This actually appears to work and allows me to get rid of some very annoying third party software.Rock
B
2

More than likely, griddata is way too hard. It's designed to work with randomly sampled data. Your data is almost certainly regularly sampled -- just not on the same grid as your target output grid.

Look at a much simpler approach like an affine transformation or a series of affine transformations on small chips if the earth's topology or curvature affect yoru results.

There are some out of the box solutions that might help. GDAL is a good example.

Also, this type of issue is often discussed in GIS. See:

https://gis.stackexchange.com/questions/10430/changing-image-projection-using-python

Baskerville answered 15/9, 2011 at 1:13 Comment(2)
Hey, thanks for the answer. I've got a few questions, though. I've been looking into GDAL and talking to a few GIS people, but haven't found answers to a few questions. 1) If I wanted to overlay coastlines/gridlines etc, is GDAL able to facilitate this, or would I need to pregenerate the coastline and gridline images? 2) How difficult is it to work with GDAL in python? The tutorials are relatively limited and the API for python is obviously deficient. 3) On that note, are there any good resources for learning GDAL in python? Thanks again for your help and time.Rock
Thanks again for the answer. It looks like I am going with GDAL as my current solution.Rock
B
2

It looks like the mlab.griddata routine may introduce additional constraints on your output data that may not be necessary. While the input locations may be anything, the output locations must be a regular grid - since your example is in lat/lon space, your choice of map projection may violate this (i.e. regular grid in x/y is not a regular grid in lat/lon).

You might try the interpolate.griddata routine from SciPy as an alternative - you'll need to combine your location variables into a single array, though, since the call signature is different: something like

import scipy.interpolate
data_locations = np.vstack(datalon.ravel(), datalat.ravel()).T
grid_locations = np.vstack(gridlon.ravel(), gridlat.ravel()).T
grid_data      = scipy.interpolate.griddata(data_locations, val.ravel(),
                                            grid_locations, method='nearest')

for nearest-neighbor interpolation. This gets the locations into an array with 2 columns corresponding to your 2 dimensions. You may also want to perform the interpolation in the transformed space of your map projection.

Babbitt answered 15/12, 2011 at 16:40 Comment(1)
Perfect, thanks Tim! This actually appears to work and allows me to get rid of some very annoying third party software.Rock
P
1

If your data is on a grid such that data point at point (datalon[i], datalat[j]) is in data[i,j], then you can use scipy.interpolate.RectBivariateSpline instead of griddata. Some geography-specific libraries may offer more functionality, though.

Piecedyed answered 20/9, 2011 at 21:16 Comment(0)
M
1

If you use pclormesh, you don't have to do any sort of interpolation. pcolormesh would gladly accept the data structure the way you have given here:

from mpl_toolkits.basemap import Basemap
m = Basemap(-----)
x,y = m(datalon, datalat)
m.pcolormesh(x,y,var)
plt.show()

kindly use this and tell me if this works or not.

However, there is some problem in pcolormesh when there is overlap of the orbit data. Please refer to this question of mine, you may find something useful.

Using pcolormesh for plotting an orbit data

Millimeter answered 7/9, 2012 at 11:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.