How to change the dtype of a Raster using Rasterio
Asked Answered
B

2

10

I've been having trouble dealing with no data values in Python's rasterio package when applying a polygon mask on a raster data set. This particular raster is Landsat uint8 with 7 bands and the no data value is not inherently specified because 255 is the reserved value for no data. However, sometimes uint8 data is compressed from uint16 data, and the 255 value is a valid data value which I do not want considered as 'no data' (the data are full bit range). The default for rasterio's mask function is to consider 0 as a 'no data' value if this argument isn't specified, which is problematic in the same way, as 0 is sometimes considered a valid data value. Is there some way to override the metadata value for 'no data'?

I tried several different ways to work around this issue (detailed below), none of them successful.

  1. Transforming uint8 data into uint16 data using rasterio.open() and assigning '256' as the no data value, as it would be outside the range of any uint8 data, but accepted within the uint16 data range. This is how certain software programs, like ArcMap, will sometimes deal with assigning no data values.

  2. Similar to step 1, but tried opening uint8 data using rasterio.open() in and setting 'nodata=np.nan' in the function. Received error: "Given nodata value, nan, is beyond the valid range of its data type." Despite the fact that in the documentation nan is listed as a valid entry for the 'nodata' argument.

  3. During the mask process using rasterio.mask(), specified nodata=nan. Received error "Cannot convert fill_value nan to dtype."

import rasterio
import fiona
import numpy as np

fp_src = ''
fp_dst = ''
shape = ''

# get shapes
with fiona.open(shape, 'r') as shapefile:
    geoms = [feature['geometry'] for feature in shapefile]



# Method Number 1
# ~~~~~~~~~~~~~~~~~~~~~~~~~

# open original raster, copy meta & alter dtype
with rasterio.open(fp_src) as src_dataset:
    kwds = src_dataset.profile
    kwds['dtype'] = 'uint16'
    src_meta = src_dataset.meta

# write a new raster with the copied and altered meta
with rasterio.open(fp_dst, 'w', **kwds) as dst_dataset:
    dst_meta = dst_dataset.meta

src_dataset.close()
dst_dataset.close()

img = rasterio.open(fp_dst)

# mask img and set nodata to 256 (out of the uint8 range)
out_image, out_transform = mask(img, geoms, nodata=256)
# out_image output: values outside of the geoms are 256 & values inside are 0.



# Method Number 2
# ~~~~~~~~~~~~~~~~~~~~~~~~~

# open original raster, copy meta & alter dtype
with rasterio.open(fp_src) as src_dataset:
    kwds = src_dataset.profile
    kwds['nodata'] = np.nan
    kwds['dtype'] = 'uint16'
    src_meta = src_dataset.meta

# write a new raster with the copied and altered meta
with rasterio.open(fp_dst, 'w', **kwds) as dst_dataset:
    dst_meta = dst_dataset.meta

src_dataset.close()
dst_dataset.close()

img = rasterio.open(fp_dst)

# mask img and let the mask function default to the image's newly created nodata (np.nan from inside with rastario.open...)
out_image, out_transform = mask(img, geoms)
# out_image output: nodata value, nan, is beyond the valid range of its data type



# Method Number 3
# ~~~~~~~~~~~~~~~~~~~~~~~~~

# mask img and set nodata to nan
out_image, out_transform = mask(fp_src, geoms, nodata=np.nan)
# out_image output: Cannot convert fill_value nan to dtype.

I hope to see all pixels outside a given polygon(s) converted to a 'no data' entry that is not necessarily part of the valid range so that there is no possibility of the script accidentally perceiving a valid value as no data.

Bhopal answered 29/7, 2019 at 0:31 Comment(0)
T
0

The problem seems to be that you didn't write any data to your new and altered rasters, only the metadata. For method 1 you could try:

# open original raster, copy meta & alter dtype
with rasterio.open(fp_src) as src_dataset:
    kwds = src_dataset.profile
    kwds['dtype'] = 'uint16'
    src_meta = src_dataset.meta
    data = src_dataset.read()

# write a new raster with the copied and altered meta
with rasterio.open(fp_dst, 'w', **kwds) as dst_dataset:
    dst_meta = dst_dataset.meta
    dst_dataset.write(data)

# not needed since you are using a 'with' statement
# src_dataset.close()
# dst_dataset.close()

with rasterio.open(fp_dst) as img:
    # mask img and set nodata to 256 (out of the uint8 range)
    out_image, out_transform = mask(img, geoms, nodata=256)
    # out_image output: values outside of the geoms should be 256 
    # & values inside should be their actual value.

Another, probably easier, method would be to use the filled=False option in rasterio's mask function. This way you get your original data in a masked numpy array where all values outside the mask geometries are masked.

Trujillo answered 20/11, 2023 at 8:4 Comment(0)
M
-1

The issue is that np.nan is a float that cannot be converted into an integer. Here's how I would solve this issue:

with rasterio.open(fp_src) as src_dataset:
    meta = src_dataset.meta
    meta.update(
        {
            "nodata": np.iinfo(src_dataset.dtypes[0]).max
        }
    )
    data = src_dataset.read()

    with rasterio.open(fp_dst, 'w', **meta) as dst_dataset:
        dst_dataset.write(data)

The function np.iinfo(dtype).max finds the maximum value for a given integer type. I used this method for a dataset with 1 band so it should work with your data as well. Let me know if it didn't.

Mcwherter answered 2/4, 2020 at 9:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.