Convert a netcdf time variable to an R date object
Asked Answered
M

4

10

I have a netcdf file with a timeseries and the time variable has the following typical metadata:

    double time(time) ;
            time:standard_name = "time" ;
            time:bounds = "time_bnds" ;
            time:units = "days since 1979-1-1 00:00:00" ;
            time:calendar = "standard" ;
            time:axis = "T" ;

Inside R I want to convert the time into an R date object. I achieve this at the moment in a hardwired way by reading the units attribute and splitting the string and using the third entry as my origin (thus assuming the spacing is "days" and the time is 00:00 etc):

require("ncdf4")
f1<-nc_open("file.nc")
time<-ncvar_get(f1,"time")
tunits<-ncatt_get(f1,"time",attname="units")
tustr<-strsplit(tunits$value, " ")
dates<-as.Date(time,origin=unlist(tustr)[3])

This hardwired solution works for my specific example, but I was hoping that there might be a package in R that nicely handles the UNIDATA netcdf date conventions for time units and convert them safely to an R date object?

Monotony answered 1/9, 2017 at 13:43 Comment(3)
Note that the newly-proposed and currently in developement awesome stars package will handle dates automatically, see the first blog post for an example: r-spatial.org/r/2017/11/23/stars1.htmlTetrarch
Ah, I forgot to add that the units package seems to handle dates gracefully. Worth a try.Tetrarch
See my edits in my answer for an exampleTetrarch
S
4

Your hopes have been met by package CFtime. This package can deal with the CF Metadata Conventions "time" dimension seamlessly, including all defined calendars.

f1 <- nc_open("file.nc")
cf <- CFtime(f1$dim$time$units, f1$dim$time$calendar, f1$dim$time$vals)
dates <- CFtimestamp(cf)

# This works reliably only for 3 of the 9 defined calendars
dates <- as.Date(dates)

The CFtimestamp() function gives a correct output for all possible dates, including the oddball "2023-02-30" but not "2023-03-31" on a "360_day" calendar. Converting to POSIXct is tricky but do you really need a Date to work with or will a character representation do just fine?

Sedation answered 27/9, 2023 at 21:53 Comment(4)
No idea why on earth this was downvoted. Seems an excellent answer to me! (of course, in the meantime I've migrated from R to python... but anyway ;-) )Monotony
@Monotony Thanks for accepting this and the upvote. And good luck in the world of python!Sedation
@Monotony in the end it was probably a wise choice ;)Tetrarch
I'm not so sure, I still wonder if I should have gone directly to Julia, my aging brain can't face yet another switch in the next years and I've already gone from IDL->metview->NCL->R->python for processing/plotting since my PhD :-DMonotony
M
5

EDIT 2023: It seems this package/answer is now obsolete, see accepted answer of Patrick for new way to do this.


I have just discovered (two years after posting the question!) that there is a package called ncdf.tools which has the function:

convertDateNcdf2R

which

converts a time vector from a netCDF file or a vector of Julian days (or seconds, minutes, hours) since a specified origin into a POSIXct R vector.

Usage:

convertDateNcdf2R(time.source, units = "days", origin = as.POSIXct("1800-01-01", 
    tz = "UTC"), time.format = c("%Y-%m-%d", "%Y-%m-%d %H:%M:%S", 
    "%Y-%m-%d %H:%M", "%Y-%m-%d %Z %H:%M", "%Y-%m-%d %Z %H:%M:%S"))

Arguments:

time.source 

numeric vector or netCDF connection: either a number of time units since origin or a netCDF file connection, In the latter case, the time vector is extracted from the netCDF file, This file, and especially the time variable, has to follow the CF netCDF conventions.

units   

character string: units of the time source. If the source is a netCDF file, this value is ignored and is read from that file.

origin  

POSIXct object: Origin or day/hour zero of the time source. If the source is a netCDF file, this value is ignored and is read from that file.

Thus it is enough to simply pass the netcdf connection as the first argument and the function handles the rest. Caveat: This will only work if the netCDF file follows CF conventions (e.g. if your units are "years since" instead of "seconds since" or "days since" it will fail for example).

More details on the function are available here: https://rdrr.io/cran/ncdf.tools/man/convertDateNcdf2R.html

Monotony answered 12/12, 2019 at 17:11 Comment(1)
Package ncdf.tools has been archived. Instead there is now a package CFtime which fully supports the CF Metadata Conventions "time" dimension.Sedation
S
4

Your hopes have been met by package CFtime. This package can deal with the CF Metadata Conventions "time" dimension seamlessly, including all defined calendars.

f1 <- nc_open("file.nc")
cf <- CFtime(f1$dim$time$units, f1$dim$time$calendar, f1$dim$time$vals)
dates <- CFtimestamp(cf)

# This works reliably only for 3 of the 9 defined calendars
dates <- as.Date(dates)

The CFtimestamp() function gives a correct output for all possible dates, including the oddball "2023-02-30" but not "2023-03-31" on a "360_day" calendar. Converting to POSIXct is tricky but do you really need a Date to work with or will a character representation do just fine?

Sedation answered 27/9, 2023 at 21:53 Comment(4)
No idea why on earth this was downvoted. Seems an excellent answer to me! (of course, in the meantime I've migrated from R to python... but anyway ;-) )Monotony
@Monotony Thanks for accepting this and the upvote. And good luck in the world of python!Sedation
@Monotony in the end it was probably a wise choice ;)Tetrarch
I'm not so sure, I still wonder if I should have gone directly to Julia, my aging brain can't face yet another switch in the next years and I've already gone from IDL->metview->NCL->R->python for processing/plotting since my PhD :-DMonotony
T
3

There is not, that I know of. I have this handy function using lubridate, which is basically identical to yours.

getNcTime <- function(nc) {
    require(lubridate)
    ncdims <- names(nc$dim) #get netcdf dimensions
    timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime", "date", "Date"))[1]] #find time variable
    times <- ncvar_get(nc, timevar)
    if (length(timevar)==0) stop("ERROR! Could not identify the correct time variable")
    timeatt <- ncatt_get(nc, timevar) #get attributes
    timedef <- strsplit(timeatt$units, " ")[[1]]
    timeunit <- timedef[1]
    tz <- timedef[5]
    timestart <- strsplit(timedef[4], ":")[[1]]
    if (length(timestart) != 3 || timestart[1] > 24 || timestart[2] > 60 || timestart[3] > 60 || any(timestart < 0)) {
        cat("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n")
        warning(paste("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n"))
        timedef[4] <- "00:00:00"
    }
    if (! tz %in% OlsonNames()) {
        cat("Warning:", tz, "not a valid timezone. Assuming UTC\n")
        warning(paste("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n"))
        tz <- "UTC"
    }
    timestart <- ymd_hms(paste(timedef[3], timedef[4]), tz=tz)
    f <- switch(tolower(timeunit), #Find the correct lubridate time function based on the unit
        seconds=seconds, second=seconds, sec=seconds,
        minutes=minutes, minute=minutes, min=minutes,
        hours=hours,     hour=hours,     h=hours,
        days=days,       day=days,       d=days,
        months=months,   month=months,   m=months,
        years=years,     year=years,     yr=years,
        NA
    )
    suppressWarnings(if (is.na(f)) stop("Could not understand the time unit format"))
    timestart + f(times)
}

EDIT: One might also want to take a look at ncdf4.helpers::nc.get.time.series

EDIT2: note that the newly-proposed and currently in developement awesome stars package will handle dates automatically, see the first blog post for an example.

EDIT3: another way is to use the units package directly, which is what stars uses. One could do something like this: (still not handling the calendar correctly, I'm not sure units can)

getNcTime <- function(nc) { ##NEW VERSION, with the units package
    require(units)
    require(ncdf4)
    options(warn=1) #show warnings by default
    if (is.character(nc)) nc <- nc_open(nc)
    ncdims <- names(nc$dim) #get netcdf dimensions
    timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime", "date", "Date"))] #find (first) time variable
    if (length(timevar) > 1) {
        warning(paste("Found more than one time var. Using the first:", timevar[1]))
        timevar <- timevar[1]
    }
    if (length(timevar)!=1) stop("ERROR! Could not identify the correct time variable")
    times <- ncvar_get(nc, timevar) #get time data
    timeatt <- ncatt_get(nc, timevar) #get attributes
    timeunit <- timeatt$units
    units(times) <- make_unit(timeunit)
    as.POSIXct(time)
}
Tetrarch answered 2/9, 2017 at 7:53 Comment(2)
NOTE: neither AF7's function or SnowFrog's function can handle the calendar=365_day attribute correctly, while ncdf4.helpers::nc.get.time.series works with a 365 day calendar!Carboy
The units package is a wrapper around UDUNITS, which does not know about calendars - these are defined in the CF Metadata Conventions. Use package CFtime for a one-stop solution.Sedation
C
3

I couldn't get @AF7's function to work with my files so I wrote my own. The function below creates a POSIXct vector of dates, for which the start date, time interval, unit and length are read from the nc file. It works with nc files of many (but probably not every...) shapes or forms.

 ncdate <- function(nc) {
    ncdims <- names(nc$dim) #Extract dimension names
    timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime",
                                          "date", "Date"))[1]] # Pick the time dimension
    ntstep <-nc$dim[[timevar]]$len
    tm <- ncvar_get(nc, timevar) # Extract the timestep count
    tunits <- ncatt_get(nc, timevar, "units") # Extract the long name of units
    tspace <- tm[2] - tm[1] # Calculate time period between two timesteps, for the "by" argument 
    tstr <- strsplit(tunits$value, " ") # Extract string components of the time unit
    a<-unlist(tstr[1]) # Isolate the unit .i.e. seconds, hours, days etc.
    uname <- a[which(a %in% c("seconds","hours","days"))[1]] # Check unit
    startd <- as.POSIXct(gsub(paste(uname,'since '),'',tunits$value),format="%Y-%m-%d %H:%M:%S") ## Extract the start / origin date
    tmulti <- 3600 # Declare hourly multiplier for date
    if (uname == "days") tmulti =86400 # Declare daily multiplier for date
    ## Rename "seconds" to "secs" for "by" argument and change the multiplier.
    if (uname == "seconds") {
        uname <- "secs"
        tmulti <- 1 }
    byt <- paste(tspace,uname) # Define the "by" argument
    if (byt == "0.0416666679084301 days") { ## If the unit is "days" but the "by" interval is in hours
    byt= "1 hour"                       ## R won't understand "by < 1" so change by and unit to hour.
    uname = "hours"}
    datev <- seq(from=as.POSIXct(startd+tm[1]*tmulti),by= byt, units=uname,length=ntstep)
}

Edit

To address the flaw highlighted by @AF7's comment that the above code would only work for regularly spaced files, datev could be calculated as

 datev <- as.POSIXct(tm*tmulti,origin=startd)
Codification answered 20/9, 2017 at 14:59 Comment(5)
many thanks - I had borrowed some of AF7 code ideas and merged them in my R script. I wonder if such a functionality could be contributed to the ncdf4 package itself? It would be great to have something like this built in as standard.Monotony
Note that this only works for regularly-spaced times, which is not necessarily true for all NetCDFs. Why didn't my function work for you? I'll try to make it more general.Tetrarch
@AdrianTompkins. There used to be a function calculating the date in the package, but there are so many types of netcdfs that it didn't work with all files, so the developer removed it (Thanks to David Pierce for the info). As it will be the same with my function and currently is with AF7's, best to have these functions unofficial and, at least, help other users perhaps customize their own.Codification
Thanks, that very useful to knowMonotony
I asked the tidync dev if maybe he could be interested. Here's the github issue, you might want to express your opinions there: github.com/hypertidy/tidync/issues/54#issuecomment-331694920Tetrarch

© 2022 - 2024 — McMap. All rights reserved.