Weighted average using NA weights

V

4

12

 a=c(1,2,NA,4)
 b=c(10,NA,30,40)
 weighted.mean(a,b,na.rm = T)

The above code gives me NA as the answer, I think na.rm only ignores the NA values in vector a and not b. How can I ignore the NA in vector b or weights to be specific. I just cannot change the NA to 0, I know that would do the trick but looking for a tweak in the formula itself.

Voltage answered 26/10, 2016 at 17:52 Comment(4)

I don't think there's a pre-made function. You'll just have to do it by manually subsetting the vectors (or write your own function). – Reiterate 26/10, 2016 at 17:58

You could edit the source code for weighted.mean and make your own custom function. – Unreserved 26/10, 2016 at 18:1

with(na.omit(data.frame(a, b)), weighted.mean(a, b)) – Umber 26/10, 2016 at 18:11

isnt there another nicer solution so far? like an na.rm.weight() option? – Grantor 1/9, 2020 at 23:30

B

6

This is the function I ended up writing to solve this problem:

weighted_mean <- function(x, w, ..., na.rm = FALSE){

  if(na.rm){

    df_omit <- na.omit(data.frame(x, w))

    return(weighted.mean(df_omit$x, df_omit$w, ...))

  } 

  weighted.mean(x, w, ...)
}

Butyraldehyde answered 26/5, 2017 at 16:51 Comment(0)

G

6

I adapted Mhairi's code to not use data.frame nor na.omit:

weighted_mean = function(x, w, ..., na.rm=F){
  if(na.rm){
    keep = !is.na(x)&!is.na(w)
    w = w[keep]
    x = x[keep]
  }
  weighted.mean(x, w, ..., na.rm=F)
}

It's really surprising that R builtin weighted.mean na.rm=T doesn't handle NA weights. Just wasted a few hours discovering that.

EDIT: here also is a data.table way in case someone wants to calculate grouped weighted means:

# mean of column a weighted by b grouped by g1 and g2
DT[!is.na(b),.(wm=weighted.mean(a,b,na.rm=T)),.(g1,g2)]
# wm will be NA for a group iff all rows for the group have
# at least one of a or b NA

Greatest answered 30/10, 2019 at 12:6 Comment(0)

H

3

I made a simple modification to the weight w in weighted.mean by coalesce as follows:

a = c(1,2,NA,4)
b = c(10,NA,30,40)
weighted.mean(a, dplyr::coalesce(b,0), na.rm = T)

The idea is I replaced missing weights by zeros, so it fix the error. It returns the result as 3.4, :)).

Halsey answered 29/4, 2020 at 2:52 Comment(0)

C

1

Another option is to use collapse::fmean which treats missing weights as 0. It also defaults to na.rm = TRUE and is very fast (see benchmark).

fmean(a, w = b)
#[1] 3.4

Benchmark:

microbenchmark::microbenchmark(
  collapse = fmean(a, w = b),
  coalesce = weighted.mean(a, dplyr::coalesce(b,0), na.rm = T),
  webb = weighted_mean(a, b, na.rm = TRUE)
)

# Unit: microseconds
#      expr     min      lq      mean  median       uq     max neval
#  collapse   5.302   6.401   9.11210   8.301  11.2010  27.601   100
#  coalesce 261.201 274.052 288.82310 280.401 291.2515 528.500   100
#      webb   7.202   8.951  11.26096  11.501  13.3010  19.202   100

Cogswell answered 5/5, 2023 at 13:21 Comment(0)

Recommended topics

Hot tags