Bootstrapping sample means in R using boot Package, Creating the Statistic Function for boot() Function
Asked Answered
M

2

5

I have a data set with 15 density calculations, each from a different transect. I would like to resampled these with replacement, taking 15 randomly selected samples of the 15 transects and then getting the mean of these resamples. Each transect should have its own personal probability of being sampled during this process. This should be done 5000 times. I have a code which does this without using the boot function but if I want to calculate the BCa 95% CI using the boot package it requires the bootstrapping to be done through the boot function first. I have been trying to create a function but I cant get any that seem to work. I want the bootstrap to select from a certain column (data$xs) and the probabilites to be used are in the column data$prob.

The function I thought might work was;

library(boot)
meanfun <- function (data, i){
    d<-data [i,]
    return (mean (d))   }
bo<-boot (data$xs, statistic=meanfun, R=5000)
#boot.ci (bo, conf=0.95, type="bca")  #obviously `bo` was not made

But this told me 'incorrect number of dimensions'

I understand how to make a function in the normal sense but it seems strange how the function works in boot. Since the function is only given to boot by name, and no specification of the arguments to pass into the function I seem limited to what boot itself will pass in as arguments (for example I am unable to pass data$xs in as the argument for data, and I am unable to pass in data$prob as an argument for probability, and so on). It seems to really limit what can be done. Perhaps I am missing something though?

Thanks for any and all help

Middling answered 13/10, 2016 at 15:28 Comment(1)
You should provide a reproducible example with sample input so we can run and test the function as well. Make the example as minimal as possible. This way we can run the code too to see what's going wrong.Anet
B
9

The reason for this error is, that data$xs returns a vector, which you then try to subset by data [i, ].

One way to solve this, is by changing it to data[i] or by using data[, "xs", drop = FALSE] instead. The drop = FALSE avoids type coercion, ie. keeps it as a data.frame.

We try

data <- data.frame(xs = rnorm(15, 2))

library(boot)
meanfun <- function(data, i){
  d <- data[i, ]
  return(mean(d))   
}
bo <- boot(data[, "xs", drop = FALSE], statistic=meanfun, R=5000)
boot.ci(bo, conf=0.95, type="bca")

and obtain:

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 5000 bootstrap replicates

CALL : 
boot.ci(boot.out = bo, conf = 0.95, type = "bca")

Intervals : 
Level       BCa          
95%   ( 1.555,  2.534 )  
Calculations and Intervals on Original Scale
Ballflower answered 13/10, 2016 at 15:50 Comment(2)
Ok thanks I will give that a try. However, is there a way to insert a probability of each sample being selected? This is quite important and from what I can tell I haven't seen any such option in the package...perhaps there is a way to build it into the function?Middling
Why is the weights arguments in boot() not adequate? I´m trying to understand what you want.Ballflower
W
2

One can use boot.array to extract all or a subset of the resampled sets. In this case:

bo.ci<-boot.ci(boot.out = bo, conf = 0.95, type = "bca")


resampled.data<-boot.array(bo,1)

To extract the first and second sets of resampled data:

resample.1<-resampled.data[1,]
resample.2<-resampled.data[2,]

Then proceed to extract the individual statistic you'd want from any subset. For isntance, If you assume normality you could run a student's t.test on teh first subset:

t.test(resample.1)

Which for this example and particular seed value(s) gives:

data: resample.1
t = 6.5216, df = 14, p-value = 1.353e-05
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
5.234781 10.365219
sample estimates:
mean of x
7.8

Weswesa answered 12/5, 2020 at 20:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.