R filter() dealing with NAs
Asked Answered
P

4

7

I am trying to implement Chebyshev filter to smooth a time series but, unfortunately, there are NAs in the data series.

For example,

t <- seq(0, 1, len = 100)                     
x <- c(sin(2*pi*t*2.3) + 0.25*rnorm(length(t)),NA, cos(2*pi*t*2.3) + 0.25*rnorm(length(t)))

I am using Chebyshev filter: cf1 = cheby1(5, 3, 1/44, type = "low")

I am trying to filter the time series exclude NAs, but not mess up the orders/position. So, I have already tried na.rm=T, but it seems there's no such argument. Then

z <- filter(cf1, x)   # apply filter

Thank you guys.

Politic answered 18/7, 2012 at 12:27 Comment(0)
S
1

You can remove the NAs beforehand using the compelete.cases function. You also might consider imputing the missing data. Check out the mtsdi or Amelia II packages.

EDIT:

Here's a solution with Rcpp. This might be helpful is speed is important:

require(inline)
require(Rcpp)
t <- seq(0, 1, len = 100)
set.seed(7337)
x <- c(sin(2*pi*t*2.3) + 0.25*rnorm(length(t)),NA, cos(2*pi*t*2.3) + 0.25*rnorm(length(t)))
NAs <- x
x2 <- x[!is.na(x)]
#do something to x2
src <- '
Rcpp::NumericVector vecX(vx);
Rcpp::NumericVector vecNA(vNA);
int j = 0;   //counter for vx
for (int i=0;i<vecNA.size();i++) {
  if (!(R_IsNA(vecNA[i]))) {
    //replace and update j
    vecNA[i] = vecX[j];
    j++;
  }
 }
return Rcpp::wrap(vecNA);
'
fun <- cxxfunction(signature(vx="numeric",
                             vNA="numeric"),
                   src,plugin="Rcpp")
if (identical(x,fun(x2,NAs)))
    print("worked")
# [1] "worked"
Shreveport answered 18/7, 2012 at 12:46 Comment(1)
i am just wondering if the complete.case is the same as na.omit. Also, since i am using the observed SST time series, i am not sure if it is a good idea to inputing the missing values.Politic
S
2

Try using x <- x[!is.na(x)] to remove the NAs, then run the filter.

Sixtasixteen answered 18/7, 2012 at 12:45 Comment(5)
sorry, but if i use na.omit, it will mass up the orders, i just want value of NA after the filter remaining NA, but all other non-NA values can pass.Politic
Sorry, I'm not sure what you're asking. Do you want to keep your values in the same position, and have spaces in your data?Sixtasixteen
would na.remove() or na.remove.ts() from the tseries package do what you want?Sixtasixteen
Thank you very much, but i am wondering if i use x <- x[!is.na(x)], the new time series will remove all NAs, and how can i recover the original NAs back into series after filtering.Politic
couldn't you use z <- filter(cf1, x[x!is.na(x)]) ?Sixtasixteen
S
1

You can remove the NAs beforehand using the compelete.cases function. You also might consider imputing the missing data. Check out the mtsdi or Amelia II packages.

EDIT:

Here's a solution with Rcpp. This might be helpful is speed is important:

require(inline)
require(Rcpp)
t <- seq(0, 1, len = 100)
set.seed(7337)
x <- c(sin(2*pi*t*2.3) + 0.25*rnorm(length(t)),NA, cos(2*pi*t*2.3) + 0.25*rnorm(length(t)))
NAs <- x
x2 <- x[!is.na(x)]
#do something to x2
src <- '
Rcpp::NumericVector vecX(vx);
Rcpp::NumericVector vecNA(vNA);
int j = 0;   //counter for vx
for (int i=0;i<vecNA.size();i++) {
  if (!(R_IsNA(vecNA[i]))) {
    //replace and update j
    vecNA[i] = vecX[j];
    j++;
  }
 }
return Rcpp::wrap(vecNA);
'
fun <- cxxfunction(signature(vx="numeric",
                             vNA="numeric"),
                   src,plugin="Rcpp")
if (identical(x,fun(x2,NAs)))
    print("worked")
# [1] "worked"
Shreveport answered 18/7, 2012 at 12:46 Comment(1)
i am just wondering if the complete.case is the same as na.omit. Also, since i am using the observed SST time series, i am not sure if it is a good idea to inputing the missing values.Politic
H
1

I don't know if ts objects can have missing values, but if you just want to re-insert the NA values, you can use ?insert from R.utils. There might be a better way to do this.

install.packages(c('R.utils', 'signal'))
require(R.utils)
require(signal)
t <- seq(0, 1, len = 100)                     
set.seed(7337)
x <- c(sin(2*pi*t*2.3) + 0.25*rnorm(length(t)), NA, NA, cos(2*pi*t*2.3) + 0.25*rnorm(length(t)), NA)
cf1 = cheby1(5, 3, 1/44, type = "low")
xex <- na.omit(x)
z <- filter(cf1, xex)   # apply
z <- as.numeric(z)
for (m in attributes(xex)$na.action) {
  z <- insert(z, ats = m, values = NA)
}
all.equal(is.na(z), is.na(x))
?insert
Haifa answered 18/7, 2012 at 16:0 Comment(0)
L
0

Here is a function you can use to filter a signal with NAs in it. The NAs are ignored rather than replaced by zero.

You can then specify a maximum percentage of weight which the NAs may take at any point of the filtered signal. If there are too many NAs (and too few actual data) at a specific point, the filtered signal itself will be set to NA.

# This function applies a filter to a time series with potentially missing data 

filter_with_NA <- function(x,
                           window_length=12,                            # will be applied centrally
                           myfilter=rep(1/window_length,window_length), # a boxcar filter by default
                           max_percentage_NA=25)                        # which percentage of weight created by NA should not be exceeded
{
  # make the signal longer at both sides
  signal <- c(rep(NA,window_length),x,rep(NA,window_length))
  # see where data are present and not NA
  present <- is.finite(signal)

  # replace the NA values by zero
  signal[!is.finite(signal)] <- 0
  # apply the filter
  filtered_signal <- as.numeric(filter(signal,myfilter, sides=2))

  # find out which percentage of the filtered signal was created by non-NA values
  # this is easy because the filter is linear
  original_weight <- as.numeric(filter(present,myfilter, sides=2))
  # where this is lower than one, the signal is now artificially smaller 
  # because we added zeros - compensate that
  filtered_signal <- filtered_signal / original_weight
  # but where there are too few values present, discard the signal
  filtered_signal[100*(1-original_weight) > max_percentage_NA] <- NA

  # cut away the padding to left and right which we previously inserted
  filtered_signal <- filtered_signal[((window_length+1):(window_length+length(x)))]
  return(filtered_signal)
}
Lustrate answered 23/6, 2017 at 21:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.