I've been having trouble dealing with no data values in Python's rasterio package when applying a polygon mask on a raster data set. This particular raster is Landsat uint8 with 7 bands and the no data value is not inherently specified because 255 is the reserved value for no data. However, sometimes uint8 data is compressed from uint16 data, and the 255 value is a valid data value which I do not want considered as 'no data' (the data are full bit range). The default for rasterio's mask function is to consider 0 as a 'no data' value if this argument isn't specified, which is problematic in the same way, as 0 is sometimes considered a valid data value. Is there some way to override the metadata value for 'no data'?
I tried several different ways to work around this issue (detailed below), none of them successful.
Transforming uint8 data into uint16 data using rasterio.open() and assigning '256' as the no data value, as it would be outside the range of any uint8 data, but accepted within the uint16 data range. This is how certain software programs, like ArcMap, will sometimes deal with assigning no data values.
Similar to step 1, but tried opening uint8 data using rasterio.open() in and setting 'nodata=np.nan' in the function. Received error: "Given nodata value, nan, is beyond the valid range of its data type." Despite the fact that in the documentation nan is listed as a valid entry for the 'nodata' argument.
During the mask process using rasterio.mask(), specified nodata=nan. Received error "Cannot convert fill_value nan to dtype."
import rasterio
import fiona
import numpy as np
fp_src = ''
fp_dst = ''
shape = ''
# get shapes
with fiona.open(shape, 'r') as shapefile:
geoms = [feature['geometry'] for feature in shapefile]
# Method Number 1
# ~~~~~~~~~~~~~~~~~~~~~~~~~
# open original raster, copy meta & alter dtype
with rasterio.open(fp_src) as src_dataset:
kwds = src_dataset.profile
kwds['dtype'] = 'uint16'
src_meta = src_dataset.meta
# write a new raster with the copied and altered meta
with rasterio.open(fp_dst, 'w', **kwds) as dst_dataset:
dst_meta = dst_dataset.meta
src_dataset.close()
dst_dataset.close()
img = rasterio.open(fp_dst)
# mask img and set nodata to 256 (out of the uint8 range)
out_image, out_transform = mask(img, geoms, nodata=256)
# out_image output: values outside of the geoms are 256 & values inside are 0.
# Method Number 2
# ~~~~~~~~~~~~~~~~~~~~~~~~~
# open original raster, copy meta & alter dtype
with rasterio.open(fp_src) as src_dataset:
kwds = src_dataset.profile
kwds['nodata'] = np.nan
kwds['dtype'] = 'uint16'
src_meta = src_dataset.meta
# write a new raster with the copied and altered meta
with rasterio.open(fp_dst, 'w', **kwds) as dst_dataset:
dst_meta = dst_dataset.meta
src_dataset.close()
dst_dataset.close()
img = rasterio.open(fp_dst)
# mask img and let the mask function default to the image's newly created nodata (np.nan from inside with rastario.open...)
out_image, out_transform = mask(img, geoms)
# out_image output: nodata value, nan, is beyond the valid range of its data type
# Method Number 3
# ~~~~~~~~~~~~~~~~~~~~~~~~~
# mask img and set nodata to nan
out_image, out_transform = mask(fp_src, geoms, nodata=np.nan)
# out_image output: Cannot convert fill_value nan to dtype.
I hope to see all pixels outside a given polygon(s) converted to a 'no data' entry that is not necessarily part of the valid range so that there is no possibility of the script accidentally perceiving a valid value as no data.