Calculating weighted mean and standard deviation
Asked Answered
I

6

40

I have a time series x_0 ... x_t. I would like to compute the exponentially weighted variance of the data. That is:

V = SUM{w_i*(x_i - x_bar)^2, i=1 to T} where SUM{w_i} = 1 and x_bar=SUM{w_i*x_i}

ref: http://en.wikipedia.org/wiki/Weighted_mean#Weighted_sample_variance

The goal is to basically weight observations that are further back in time less. This is very simple to implement but I would like to use as much built in funcitonality as possible. Does anyone know what this corresponds to in R?

Thanks

Iover answered 6/4, 2012 at 21:20 Comment(1)
I'm guessing that this is an incomplete specification and that what you really want delivered will require better specification of how w_i is constructed and more detail on the limits of summation.Guerrero
F
39

R provides weighted mean. In fact, ?weighted.mean shows this example:

 ## GPA from Siegel 1994
 wt <- c(5,  5,  4,  1)/15
 x <- c(3.7,3.3,3.5,2.8)
 xm <- weighted.mean(x, wt)

One more step:

v <- sum(wt * (x - xm)^2)
Frieze answered 7/4, 2012 at 2:23 Comment(8)
Hmisc, as it turned out, does just this.Iover
Note the last line in the answer. That is the weighted variance.Frieze
Just a hint... If you are thick skulled like me, 15 is the sum of the individual weights. Weights are then normalized in this case. I didn't catch that at first.Undry
you forgot to divide v by n or n-1Eudora
@Eudora The formula above represents the population variance for the set. Note that sum(wt) == 1 so multiplying by wt in the expression is that division.Frieze
it doesn't produce exactly the same value than the function wt.var(x, wt) from the library SMDtools, maybe they use some correction?Eudora
It seems this answer currently works best. I think more consistent would be to have v <- weighted.mean( (x-xm)^2, wt ) Since that also works when weights are not normalized.Darter
You should also consider whether your weights are frequency or reliability weights. See @Matifou below.Iover
E
36

The Hmisc package contains the functions you need.

Thus:

x <- c(3.7,3.3,3.5,2.8)

wt <- c(5,  5,  4,  1)/15

xm <- wtd.mean(x, wt)

var <- wtd.var(x, wt)

sd <- sqrt(var)

Unfortunately the author of the Hmisc package did not include an explicit wtd.sd function. You have to square root wtd.var.

Charles Kangai

Erythroblast answered 24/6, 2014 at 10:59 Comment(3)
wtd.mean works, but wtd.var in your example gives 'INF'. why is that?Dick
@Dick This is now fixed in the development version of Hmisc. github.com/harrelfe/Hmisc/issues/69Unicycle
sum(wt) need not be 1.Duodenary
K
8

I too get errors from Hmisc when using the wtd.var() function. Fortunately, SDMTools has comparable functionality, and even calculates SD directly for you, without needing to take sqrt of variance.

library(SDMTools)

x <- c(3.7,3.3,3.5,2.8)
wt <- c(5,  5,  4,  1)/15  ## Note: no actual need to normalize weights to sum to 1, this will be done automatically.

wt.mean(x, wt)
wt.sd(x,wt)

wt.var(x, wt)
Kalynkam answered 18/8, 2017 at 18:22 Comment(1)
SDMTools is not maintained anymore.Isopropyl
F
3

Package Hmisc has function wt.var(), as noted by others.

Note that you need to understand whether you want frequency or reliability weights. In your case, I believe you are interested in reliability weights, so will need to set explicitely normwt=TRUE. In that case you can give your weights in any format (sum to 1, sum to N, etc). If you were to use frequency weights, you would need to be careful about how you specify weights.

library(Hmisc)

n <- 3
x <- seq_len(n)
w <- c(0.1, 0.2, 0.6)
w2 <- w / min(w)
w3 <- w / sum(w)

## reliability weights?
wtd.var(x = x, weights = w, normwt=TRUE)
#> [1] 0.95
wtd.var(x = x, weights = w2, normwt=TRUE)
#> [1] 0.95
wtd.var(x = x, weights = w3, normwt=TRUE)
#> [1] 0.95

## frequency weights?
wtd.var(x = x, weights = w)
#> Warning in wtd.var(x = x, weights = w): only one effective observation; variance
#> estimate undefined
#> [1] -4.222222
wtd.var(x = x, weights = w2)
#> [1] 0.5277778
wtd.var(x = x, weights = w3)
#> Warning in wtd.var(x = x, weights = w3): only one effective observation;
#> variance estimate undefined
#> [1] Inf

Created on 2020-08-26 by the reprex package (v0.3.0)

Furmenty answered 26/8, 2020 at 23:2 Comment(1)
The difference between frequency/reliability weights isn't given enough attention. +1Iover
P
2

For the Variance and Standard deviation you have to distinguish between biased estimate and frequency and reliability / sampling weights.

x <- c(1, 4, 5)
wf <- c(1, 2, 3)        # Frequency counts
ws <- c(0.1, 0.2, 0.3)  # Sampling weights

## Weighted mean
mean(rep(x, wf))        # Works only for integer frequencys
#[1] 4
sum(x * wf) / sum(wf)
#[1] 4
sum(x * ws) / sum(ws)
#[1] 4
weighted.mean(x, wf)
#[1] 4
weighted.mean(x, ws)
#[1] 4

## Frequency counts
var(rep(x, wf))                        # Variance
#[1] 2.4
sd(rep(x, wf))                         # Standard deviation
#[1] 1.549193
sw <- sum(wf)
xm <- sum(x * wf) / sw
sum(wf * (x - xm)^2) / (sw - 1)        # Variance
#[1] 2.4
sqrt(sum(wf * (x - xm)^2) / (sw - 1))  # Standard deviation
#[1] 1.549193

## Sampling weights
xm <- weighted.mean(x, ws)
sum(ws *(x-xm)^2)*(sum(ws)/(sum(ws)^2-sum(ws^2)))  # Variance
#[1] 3.272727
cov.wt(matrix(x, ncol=1), ws)$cov                  # Variance
#[1,] 3.272727

## BIASED weighted sample variance
xm <- weighted.mean(x, ws)
sum(ws * (x - xm)^2) / sum(ws)  # Variance
#[1] 2
xm <- weighted.mean(x, wf)
sum(wf * (x - xm)^2) / sum(wf)  # Variance
#[1] 2

## Using Hmisc
Hmisc::wtd.var(x, wf)
#[1] 2.4
Hmisc::wtd.var(x, ws, normwt=TRUE)
#[1] 3.272727
Pericycle answered 22/8, 2023 at 7:16 Comment(0)
I
0

Hmisc package provides this functionality:

http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=Hmisc:wtd.stats

Iover answered 8/4, 2012 at 2:27 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.