R2WinBUGS - logistic regression with simulated data
Asked Answered
B

1

6

I am just wondering whether anyone has some R code that uses the package R2WinBUGS to run logistic regression - ideally with simulated data to generate the 'truth' and two continous co-variates.

Thanks.

Christian

PS:

Potential code to generate artificial data (one dimensional case) and run winbugs via r2winbugs (it does not work yet).

library(MASS)
library(R2WinBUGS)

setwd("d:/BayesianLogisticRegression")

n.site <- 150

X1<- sort(runif(n = n.site, min = -1, max =1))

xb <- 0.0 + 3.0*X1 

occ.prob <- 1/(1+exp(-xb))

plot(X1, occ.prob,xlab="X1",ylab="occ.prob")

true.presence <- rbinom(n = n.site, size = 1, prob = occ.prob)

plot(X1, true.presence,xlab="X1",ylab="true.presence")

# combine data as data frame and save
data <- data.frame(X1, true.presence)
write.matrix(data, file = "data.txt", sep = "\t")

sink("model.txt")
cat("
model {

# Priors
 alpha ~ dnorm(0,0.01)
 beta ~ dnorm(0,0.01)

# Likelihood
 for (i in 1:n) {
    C[i] ~ dbin(p[i], N)        # Note p before N
    logit(p[i]) <- alpha + beta *X1[i]
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(mass = X1, n = length(X1))

# Inits function
inits <- function(){ list(alpha=rlnorm(1), beta=rlnorm(1))}

# Parameters to estimate
params <- c("alpha", "beta")

# MCMC settings
nc <- 3 #Number of Chains
ni <- 1200 #Number of draws from posterior
nb <- 200 #Number of draws to discard as burn-in
nt <- 2 Thinning rate

# Start Gibbs sampling
out <- bugs(data=win.data, inits=inits, parameters.to.save=params, 
model.file="model.txt", n.thin=nt, n.chains=nc, n.burnin=nb, 
n.iter=ni, debug = TRUE)
Bennet answered 24/11, 2011 at 17:30 Comment(3)
page 140 of books.google.ca/books?id=WpeZyTc6U94C gives you a partial answer. Googling "logistic regression WinBUGS" also gets a lot of hits -- haven't looked at them all but suspect there is probably code there. Can you post what you've tried so far? Also see the glmmBUGS package ...Naaman
I am looking especially for R code (package R2WinBUGS) in conjunction with artificial data generation.Bennet
Hi csetzkorn! You know Marc Kery? From the previous question it seems that you are using code from Marc Kery's book :-) He has many examples on this there...Spruik
S
5

Your script is exactly the way to do it. It is almost working, it just required one simple change to make it work:

win.data <- list(X1 = X1, n = length(X1), C = true.presence, N = 1)

Which defines which data go to WinBugs. The variable C must be filled with true.presence, N must be 1 according to the data you generated - note that this is a special case of binomial distribution for N = 1, which is called Bernoulli - a simple "coin flip".

Here is the output:

> print(out, dig = 3)
Inference for Bugs model at "model.txt", fit using WinBUGS,
 3 chains, each with 1200 iterations (first 200 discarded), n.thin = 2
 n.sims = 1500 iterations saved
            mean    sd    2.5%     25%     50%     75%   97.5%  Rhat n.eff
alpha     -0.040 0.221  -0.465  -0.187  -0.037   0.114   0.390 1.001  1500
beta       3.177 0.478   2.302   2.845   3.159   3.481   4.165 1.000  1500
deviance 136.438 2.059 134.500 135.000 135.800 137.200 141.852 1.001  1500

as you see, the parameters correspond to the parameters used to generate the data. Also, if you compare with the frequentist solution, you see it corresponds.

EDIT: but the typical logistic (~ binomial) regression would measure some counts with some upper value N[i], and it would allow for different N[i] for each observation. For example say the proportion of juveniles to the whole population (N). This would require just to add index to N in your model:

C[i] ~ dbin(p[i], N[i])

The data generation would look something like:

N = rpois(n = n.site, lambda = 50) 
juveniles <- rbinom(n = n.site, size = N, prob = occ.prob)
win.data <- list(X1 = X1, n = length(X1), C = juveniles, N = N)

(end of edit)

For more examples from population ecology see books of Marc Kéry (Introduction to WinBUGS for ecologist, and especially Bayesian Population Analysis using WinBUGS: A hierarchical perspective, which is a great book).

The complete script I used - the corrected script of yours is listed here (comparison with frequentist solution at the end):

#library(MASS)
library(R2WinBUGS)

#setwd("d:/BayesianLogisticRegression")

n.site <- 150

X1<- sort(runif(n = n.site, min = -1, max =1))

xb <- 0.0 + 3.0*X1 

occ.prob <- 1/(1+exp(-xb)) # inverse logit

plot(X1, occ.prob,xlab="X1",ylab="occ.prob")

true.presence <- rbinom(n = n.site, size = 1, prob = occ.prob)

plot(X1, true.presence,xlab="X1",ylab="true.presence")

# combine data as data frame and save
data <- data.frame(X1, true.presence)
write.matrix(data, file = "data.txt", sep = "\t")

sink("tmp_bugs/model.txt")
cat("
model {

# Priors
 alpha ~ dnorm(0,0.01)
 beta ~ dnorm(0,0.01)

# Likelihood
 for (i in 1:n) {
    C[i] ~ dbin(p[i], N)        # Note p before N
    logit(p[i]) <- alpha + beta *X1[i]
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(X1 = X1, n = length(X1), C = true.presence, N = 1)

# Inits function
inits <- function(){ list(alpha=rlnorm(1), beta=rlnorm(1))}

# Parameters to estimate
params <- c("alpha", "beta")

# MCMC settings
nc <- 3 #Number of Chains
ni <- 1200 #Number of draws from posterior
nb <- 200 #Number of draws to discard as burn-in
nt <- 2 #Thinning rate

# Start Gibbs sampling
out <- bugs(data=win.data, inits=inits, parameters.to.save=params, 
model.file="model.txt", n.thin=nt, n.chains=nc, n.burnin=nb, 
n.iter=ni, 
working.directory = paste(getwd(), "/tmp_bugs/", sep = ""),
debug = TRUE)

print(out, dig = 3)

# Frequentist approach for comparison
m1 = glm(true.presence ~ X1, family = binomial)
summary(m1)

# normally, you should do it this way, but the above also works:
#m2 = glm(cbind(true.presence, 1 - true.presence) ~ X1, family = binomial)
Spruik answered 28/11, 2011 at 18:39 Comment(6)
As your example is not typical logistic regression,I've added a note on such typical regression. See edit.Spruik
Thanks Tomas T. This is exactly what I needed. I already have the book: Introduction to WinBUGS for ecologist. That’s where I took some code from. I have to admit that I do not fully understand the data generation yet. Usually I would have applied a threshold to the output of the link function (e.g. if probability >= 0.5 then 1 else 0 for true.presence). Does the binomial distribution introduce another layer of randomness?Bennet
BTW ultimately I would like to adjust this for the presence only case as discussed here: Data augmentation in bayesian modelling of presence-only data (can you access it?). I could post this as another question and would really appreciate if you could help me with this ... Thanks so far!Bennet
Hi Christian! Of course, the binomial distribution introduces randomness, this is absolutely needed! There are some factors which affect the probability, but the probability is only a parameter of a "coin flip", it is not result itself yet. One still needs to flip the coin to get the actual result :-) BTW, I've send you an email on Friday (on some address I found at your website), have you received it?Spruik
Tomas I have received your email. Sorry I did not reply yet. Shall we communicate via email reg. the presence-only data problem. Marc Kery has a very suitable example in chapter 20 but he samples several times from each location - I only have one sample per location )-:Bennet
Hi Christian, yes you are wellcome to post other problems on WinBUGS... :-)Spruik

© 2022 - 2024 — McMap. All rights reserved.