How to extract the data in a netcdf file based on a shapefile in R?
Asked Answered
F

1

6

I have a netcdf file (global domain) which I do not know its projection information, and I want to extract the data (with lon and lat) based on the shapefile.

The objective is to find data for the domain represented by the shapefile ( the original netcdf contains global data). Besides, the final data format should be a data frame, which contains lon, lat, and data. I assume that the extract and mask function will be useful.

The netcdf and shapefile can be found at https://www.dropbox.com/s/ju96eitzzd0xxg8/precip.2000.nc?dl=0 and https://www.dropbox.com/s/8wfgf8207dbh79r/gpr_000b11a_e.zip?dl=0. Thanks for any help.

library(rgdal)
library(ncdf4)
library(raster)

shp = readOGR("gpr_000b11a_e.shp")
pre1 = nc_open("precip.2000.nc")
pre1

File precip.2000.nc (NC_FORMAT_NETCDF4_CLASSIC):
 1 variables (excluding dimension variables):
    float precip[lon,lat,time]   
        missing_value: -9.96920996838687e+36
        var_desc: Precipitation
        level_desc: Surface
        statistic: Total
        parent_stat: Other
        long_name: Daily total of precipitation
        cell_methods: time: sum
        avg_period: 0000-00-01 00:00:00
        actual_range: 0
         actual_range: 482.555358886719
        units: mm
        valid_range: 0
         valid_range: 1000
        dataset: CPC Global Precipitation

 3 dimensions:
    lat  Size:360
        actual_range: 89.75
         actual_range: -89.75
        long_name: Latitude
        units: degrees_north
        axis: Y
        standard_name: latitude
        coordinate_defines: center
    lon  Size:720
        long_name: Longitude
        units: degrees_east
        axis: X
        standard_name: longitude
        actual_range: 0.25
         actual_range: 359.75
        coordinate_defines: center
    time  Size:366   *** is unlimited ***
        long_name: Time
        axis: T
        standard_name: time
        coordinate_defines: start
        actual_range: 876576
         actual_range: 885336
        delta_t: 0000-00-01 00:00:00
        avg_period: 0000-00-01 00:00:00
        units: hours since 1900-01-01 00:00:00

7 global attributes:
    Conventions: CF-1.0
    version: V1.0
    history: created 9/2016 by CAS NOAA/ESRL PSD
    title: CPC GLOBAL PRCP V1.0
    References: https://www.esrl.noaa.gov/psd/data/gridded/data.cpc.globalprecip.html
    dataset_title: CPC GLOBAL PRCP V1.0
    Source: ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/ 

As we can see, there is no information about the projection or crs for the netcdf file.

Fidelis answered 6/5, 2018 at 21:24 Comment(5)
the noaa climate files use a lat-long wgs84 geographic coordinate systems, with longitude going from 0 to 360 rather than -180 to 180. You can use raster::rotate to convert between conventional longitudesErmey
@Ermey My specific question is how to extract the data (with lon and lat) from the netcdf based on the shapefile. I have followed your advice, but errors still occur. I have been struggling on this problem for several weeks, could you help me? Thank you so much.Fidelis
That's a bit too vague for me to understand. What do you mean by "extract the data (with lon and lat) from the netcdf based on the shapefile". Can I suggest you take a little time to edit the question with a clear example of what you want (using sketches or dummy data or whatever works best) indicating very clearly what the expected output is. Once the question is clear, message me and I'll take another look.Ermey
@Ermey I have edited the question to make it clear. Thank you for your time.Fidelis
@Ermey Could you take a look? Thanks a lot.Fidelis
E
9

You need to open the NetCDF file as a raster* object to process as a raster it in R. Use brick or stack for this, rather than nc_open

pre1.brick = brick("precip.2000.nc")

You will notice that this file uses the normal convention in climate science of giving longitudes in values ranging from 0 to 360 degrees:

extent(pre1.brick)
# class       : Extent 
# xmin        : 0 
# xmax        : 360 
# ymin        : -90 
# ymax        : 90 

We can convert to conventional -180 to 180 longitudes usint rotate

pre1.brick = rotate(pre1.brick)

Now lets see what the projections of our raster and polygon files are:

crs(shp)
# +proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0 
crs(pre1.brick)
# +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 

You can see that both use lat-long coordinates, but differ in terms of datum and ellipse. It is usually recommended for accuracy and speed, when possible, to project the vector data rather than the raster to get them both in the same coordinate system:

shp = spTransform(shp, crs(pre1.brick))

Having projected both to the same coord system, we can apply the shapefile as a mask to the raster brick:

pre1.mask = mask(pre1.brick, shp)

And convert to a data.frame. Here I just demonstrate for the first layer. You can do all layers at once if you prefer, by removing [[1]] in the following line

pre1.df = as.data.frame(pre1.mask[[1]], xy=TRUE)

If you want, you can remove lines that have NA, to only leave cells inside the mask containing data:

pre1.df = pre1.df[complete.cases(pre1.df),]
head(pre1.df)
#            x     y X2000.01.01.00.00.00
# 10278 -81.25 82.75            0.2647110
# 10279 -80.75 82.75            0.2721323
# 10280 -80.25 82.75            0.2797630
# 10281 -79.75 82.75            0.2875668
# 10282 -79.25 82.75            0.2925712
# 10283 -78.75 82.75            0.2995636
Ermey answered 9/5, 2018 at 15:52 Comment(1)
Thank you so much for your detailed codes and explanation. It works great. Thank you for your time. May I know some of your research (such as published papers)?Fidelis

© 2022 - 2024 — McMap. All rights reserved.