Is there any command to find the standard error of the mean in R?
The standard error is just the standard deviation divided by the square root of the sample size. So you can easily make your own function:
> std <- function(x) sd(x)/sqrt(length(x))
> std(c(1,2,3,4))
[1] 0.6454972
The standard error (SE) is just the standard deviation of the sampling distribution. The variance of the sampling distribution is the variance of the data divided by N and the SE is the square root of that. Going from that understanding one can see that it is more efficient to use variance in the SE calculation. The sd
function in R already does one square root (code for sd
is in R and revealed by just typing "sd"). Therefore, the following is most efficient.
se <- function(x) sqrt(var(x)/length(x))
stderr
is a function name in base
. –
Wince stderr
does NOT calculate standard error it displays display aspects. of connection
–
Brassiere stderr
calculates the standard error, he was warning that this name is used in base, and John originally named his function stderr
(check the edit history...). –
Shrift na.rm = TRUE
it gives incorrect results as length(x)
does not account for removed NA
s. –
Florentinaflorentine A version of John's answer above that removes the pesky NA's:
stderr <- function(x, na.rm=FALSE) {
if (na.rm) x <- na.omit(x)
sqrt(var(x)/length(x))
}
stderr
in the base
package that does something else, so it might be better to chose another name for this one, e.g. se
–
Ayacucho As I'm going back to this question every now and then and because this question is old, I'm posting a benchmark for the most voted answers.
Note, that for @Ian's and @John's answers I created another version. Instead of using length(x)
, I used sum(!is.na(x))
(to avoid NAs).
I used a vector of 10^6, with 1,000 repetitions.
library(microbenchmark)
set.seed(123)
myVec <- rnorm(10^6)
IanStd <- function(x) sd(x)/sqrt(length(x))
JohnSe <- function(x) sqrt(var(x)/length(x))
IanStdisNA <- function(x) sd(x)/sqrt(sum(!is.na(x)))
JohnSeisNA <- function(x) sqrt(var(x)/sum(!is.na(x)))
AranStderr <- function(x, na.rm=FALSE) {
if (na.rm) x <- na.omit(x)
sqrt(var(x)/length(x))
}
mbm <- microbenchmark(
"plotrix" = {plotrix::std.error(myVec)},
"IanStd" = {IanStd(myVec)},
"JohnSe" = {JohnSe(myVec)},
"IanStdisNA" = {IanStdisNA(myVec)},
"JohnSeisNA" = {JohnSeisNA(myVec)},
"AranStderr" = {AranStderr(myVec)},
times = 1000)
mbm
Results:
Unit: milliseconds
expr min lq mean median uq max neval cld
plotrix 10.3033 10.89360 13.869947 11.36050 15.89165 125.8733 1000 c
IanStd 4.3132 4.41730 4.618690 4.47425 4.63185 8.4388 1000 a
JohnSe 4.3324 4.41875 4.640725 4.48330 4.64935 9.4435 1000 a
IanStdisNA 8.4976 8.99980 11.278352 9.34315 12.62075 120.8937 1000 b
JohnSeisNA 8.5138 8.96600 11.127796 9.35725 12.63630 118.4796 1000 b
AranStderr 4.3324 4.41995 4.634949 4.47440 4.62620 14.3511 1000 a
library(ggplot2)
autoplot(mbm)
Remembering that the mean can also by obtained using a linear model, regressing the variable against a single intercept, you can use also the lm(x~1)
function for this!
Advantages are:
- You obtain immediately confidence intervals with
confint()
- You can use tests for various hypothesis about the mean, using for example
car::linear.hypothesis()
- You can use more sophisticated estimates of the standard deviation, in case you have some heteroskedasticity, clustered-data, spatial-data etc, see package
sandwich
## generate data
x <- rnorm(1000)
## estimate reg
reg <- lm(x~1)
coef(summary(reg))[,"Std. Error"]
#> [1] 0.03237811
## conpare with simple formula
all.equal(sd(x)/sqrt(length(x)),
coef(summary(reg))[,"Std. Error"])
#> [1] TRUE
## extract confidence interval
confint(reg)
#> 2.5 % 97.5 %
#> (Intercept) -0.06457031 0.0625035
Created on 2020-10-06 by the reprex package (v0.3.0)
You can use the function stat.desc from pastec package.
library(pastec)
stat.desc(x, BASIC =TRUE, NORMAL =TRUE)
you can find more about it from here: https://www.rdocumentation.org/packages/pastecs/versions/1.3.21/topics/stat.desc
© 2022 - 2024 — McMap. All rights reserved.