Weighted average using NA weights
Asked Answered
V

4

12
 a=c(1,2,NA,4)
 b=c(10,NA,30,40)
 weighted.mean(a,b,na.rm = T)

The above code gives me NA as the answer, I think na.rm only ignores the NA values in vector a and not b. How can I ignore the NA in vector b or weights to be specific. I just cannot change the NA to 0, I know that would do the trick but looking for a tweak in the formula itself.

Voltage answered 26/10, 2016 at 17:52 Comment(4)
I don't think there's a pre-made function. You'll just have to do it by manually subsetting the vectors (or write your own function).Reiterate
You could edit the source code for weighted.mean and make your own custom function.Unreserved
with(na.omit(data.frame(a, b)), weighted.mean(a, b))Umber
isnt there another nicer solution so far? like an na.rm.weight() option?Grantor
B
6

This is the function I ended up writing to solve this problem:

weighted_mean <- function(x, w, ..., na.rm = FALSE){

  if(na.rm){

    df_omit <- na.omit(data.frame(x, w))

    return(weighted.mean(df_omit$x, df_omit$w, ...))

  } 

  weighted.mean(x, w, ...)
}
Butyraldehyde answered 26/5, 2017 at 16:51 Comment(0)
G
6

I adapted Mhairi's code to not use data.frame nor na.omit:

weighted_mean = function(x, w, ..., na.rm=F){
  if(na.rm){
    keep = !is.na(x)&!is.na(w)
    w = w[keep]
    x = x[keep]
  }
  weighted.mean(x, w, ..., na.rm=F)
}

It's really surprising that R builtin weighted.mean na.rm=T doesn't handle NA weights. Just wasted a few hours discovering that.

EDIT: here also is a data.table way in case someone wants to calculate grouped weighted means:

# mean of column a weighted by b grouped by g1 and g2
DT[!is.na(b),.(wm=weighted.mean(a,b,na.rm=T)),.(g1,g2)]
# wm will be NA for a group iff all rows for the group have
# at least one of a or b NA
Greatest answered 30/10, 2019 at 12:6 Comment(0)
H
3

I made a simple modification to the weight w in weighted.mean by coalesce as follows:

a = c(1,2,NA,4)
b = c(10,NA,30,40)
weighted.mean(a, dplyr::coalesce(b,0), na.rm = T)

The idea is I replaced missing weights by zeros, so it fix the error. It returns the result as 3.4, :)).

Halsey answered 29/4, 2020 at 2:52 Comment(0)
C
1

Another option is to use collapse::fmean which treats missing weights as 0. It also defaults to na.rm = TRUE and is very fast (see benchmark).

fmean(a, w = b)
#[1] 3.4

Benchmark:

microbenchmark::microbenchmark(
  collapse = fmean(a, w = b),
  coalesce = weighted.mean(a, dplyr::coalesce(b,0), na.rm = T),
  webb = weighted_mean(a, b, na.rm = TRUE)
)

# Unit: microseconds
#      expr     min      lq      mean  median       uq     max neval
#  collapse   5.302   6.401   9.11210   8.301  11.2010  27.601   100
#  coalesce 261.201 274.052 288.82310 280.401 291.2515 528.500   100
#      webb   7.202   8.951  11.26096  11.501  13.3010  19.202   100
Cogswell answered 5/5, 2023 at 13:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.