Modelling data with a Weibull link function in R

Asked 8/2, 2013 at 11:57 Answered 13/6, 2013 at 12:44

Solved r distribution curve-fitting weibull

I am trying to model some data that follows a sigmoid curve relationship. In my field of work (psychophysics), a Weibull function is usually used to model such relationships, rather than probit.

I am trying to create a model using R and am struggling with syntax. I know that I need to use the vglm() function from the VGAM package, but I am unable to get a sensible model out. Here's my data:

# Data frame example data
dframe1 <- structure(list(independent_variable = c(0.3, 0.24, 0.23, 0.16, 
0.14, 0.05, 0.01, -0.1, -0.2), dependent_variable = c(1, 1, 
1, 0.95, 0.93, 0.65, 0.55, 0.5, 0.5)), .Names = c("independent_variable", 
"dependent_variable"), class = "data.frame", row.names = c(NA, 
-9L))

Here is a plot of the data in dframe1:

library(ggplot2)

# Plot my original data
ggplot(dframe1, aes(independent_variable, dependent_variable)) + geom_point()

enter image description here

This should be able to be modelled by a Weibull function, since the data fit a sigmoid curve relationship. Here is my attempt to model the data and generate a representative plot:

library(VGAM)

# Generate model
my_model <- vglm(formula = dependent_variable ~ independent_variable, family = weibull, data = dframe1)

# Create a new dataframe based on the model, so that it can be plotted
model_dframe <- data.frame(dframe1$independent_variable, fitted(my_model))

# Plot my model fitted data
ggplot(model_dframe, aes(dframe1.independent_variable, fitted.my_model.)) + geom_point()

enter image description here

As you can see, this doesn't represent my original data at all. I'm either generating my model incorrectly, or I'm generating my plot of the model incorrectly. What am I doing wrong?

Note: I have edited this question to make it more understandable; previously I had been using the wrong function entirely (weibreg()). Hence, some of the comments below may not make sense. .....

Jarrod answered 8/2, 2013 at 11:57 Comment(6)

I originally pointed you to weibreg(), but it seems like this was a red herring. I am very sorry. weibreg() apparently only handles Weibull regression for survival models (which are commonly modeled with the Weibull) - but psychophysics seem to be unique in that they model non-survival data with a Weibull link function where everyone else would use a logit or probit. However, it looks like the vglm() function in the VGAM package may work: rss.acs.unt.edu/Rdoc/library/VGAM/html/weibull.html If you could add the output of dput(dframe) to your post, I will try to help more. – Librarianship 8/2, 2013 at 20:6

Thanks Stephan, this is a learning experience for me! I've added the 'dput()' to my question. Any advice on how to run the function would be appreciated. – Jarrod 9/2, 2013 at 14:31

Well, I sure hope you have more than three observations! I guess your p value comes from multiple observations, so I suggest you put them all in the data frame. Then I would fit the model using model <- vglm(p~size,family=weibull,data=dframe) (you will need to tell vglm() what is the dependent and what is the independent variable) and examine the result with summary(model). Your warning message means that the ML estimate yields an invalid shape parameter; it may disappear with more data. But I certainly won't say that I understand vglm deeply; perhaps someone else can help? – Librarianship 9/2, 2013 at 20:3

OK, I can see from your example that your independent variable plausibly follows a cumulative-Weibull shape. But: what are the statistical properties of the observed values? Are they normally distributed? Are they proportions, in which case they might be beta-distributed? Need to know this in order to fit the statistical model ... I looked at cornea.berkeley.edu/pubs/148.pdf , and it looks like your data are probably yes/no proportions? In order to do this properly we probably need the denominators (i.e., numbers of trials for each point). – Christine 19/2, 2013 at 15:48

It also seems funny that the lower asymptote is 0.5 rather than 1 ... can you explain? – Christine 19/2, 2013 at 15:48

Yes @Ben; they are '2AFC' yes/no psychophysical data. Each data point represents 40 trials. The asymptote is at 0.5 since this is the level at which the observer (of my psychophysical experiment) was making complete guesses, hence they get the answer right on 50% of trials, due to pure chance. – Jarrod 19/2, 2013 at 20:48

Here's my solution, with bbmle.

Data:

dframe1 <- structure(list(independent_variable = c(0.3, 0.24, 0.23, 0.16, 
0.14, 0.05, 0.01, -0.1, -0.2), dependent_variable = c(1, 1, 
1, 0.95, 0.93, 0.65, 0.55, 0.5, 0.5)), .Names = c("independent_variable", 
"dependent_variable"), class = "data.frame", row.names = c(NA, 
-9L))

Construct a cumulative Weibull that goes from 0.5 to 1.0 by definition:

wfun <- function(x,shape,scale) {
    (1+pweibull(x,shape,scale))/2.0
}

dframe2 <- transform(dframe1,y=round(40*dependent_variable),x=independent_variable)

Fit a Weibull (log-scale relevant parameters), with binomial variation:

library(bbmle)
m1 <- mle2(y~dbinom(prob=wfun(exp(a+b*x),shape=exp(logshape),scale=1),size=40),
     data=dframe2,start=list(a=0,b=0,logshape=0))

Generate predictions:

pframe <- data.frame(x=seq(-0.2,0.3,length=101))
pframe$y <- predict(m1,pframe)

png("wplot.png")
with(dframe2,plot(y/40~x))
with(pframe,lines(y/40~x,col=2))
dev.off()

enter image description here

Christine answered 20/2, 2013 at 4:7 Comment(3)

Thank you very much for this Ben. On some of my trials, I exceeded 40 presentations. I am faced with the option of a) ignoring data collected after the 40th, or b) modifying the calculation of 'm1' to take account for the trials that exceeded 40 presentations. Although it would probably make little difference to the outcome, I wonder if there is a way of incorporating these extra data? I have managed to incorporate a variable 'n_presentations' up as far as the last step, but don't know how to generate a p_frame that allows for different sample sizes in each datum. – Jarrod 21/2, 2013 at 12:41

You should certainly be able to account for differing sample sizes: just make sure y in the model above is the number of successes and size is the actual number of trials (it can be a vector, of course). Since you're trying to predict probabilities, I think you can put anything you want into n_presentations. Try a column of n_presentations=1 and see if that works. Otherwise it shouldn't be too hard to generate the predictions by hand. – Christine 22/2, 2013 at 14:8

Thanks. The problem seems to come when predicting 'y' values using the model generated in mle2. If I input a vector n_presentations as the size= parameter, the pframe$y <- predict(m1,pframe) line doesn't know how to handle it. Presumably, since this line tries to extrapolate 101 points from the nine input values, it doesn't know what 'size' to use for each point (this fails even if n_presentations is '40' for each datum)... Since there is no 'trend' in the number of trials for each point, it would surely be impossible for the model to know how to scale each value of y? – Jarrod 23/2, 2013 at 13:47

OK, I just came across this several months late, but you could also use the mafc.cloglog link from the psyphy package with glm. If the x follows the cloglog then log(x) will follow a weibull psychometric function. The catch as with the above responses is that you need the number of trials for the proportion correct. I just set it to 100 so it would give an integer number of trials but you should fix this to correspond to the numbers that you actually used. Here is the code to do it.

dframe1 <- structure(list(independent_variable = c(0.3, 0.24, 0.23, 0.16, 
0.14, 0.05, 0.01, -0.1, -0.2), dependent_variable = c(1, 1, 
1, 0.95, 0.93, 0.65, 0.55, 0.5, 0.5)), .Names = c("independent_variable", 
"dependent_variable"), class = "data.frame", row.names = c(NA, 
-9L))

library(psyphy)

plot(dependent_variable ~ independent_variable, dframe1)
fit <- glm(dependent_variable ~ exp(independent_variable), 
    binomial(mafc.cloglog(2)), 
    data = dframe1, 
    weights = rep(100, nrow(dframe1)))  # assuming 100 observations per point
xx <- seq(-0.2, 0.3, len = 100)
pred <- predict(fit, newdata = data.frame(independent_variable = xx), type = "response")
lines(xx, pred)

Fit to data

Juanajuanita answered 13/6, 2013 at 12:44 Comment(0)

You could also use the drc-package (dose-response-modelling).

I am actually a noob for this kind of models, but perhabs it helps somehow...

Here I fitted a four parameter Weibull, with fixed parameters for the asymptotes (otherwise the upper asymptote would be slightly greater 1, don't know if this is an issue for you). I also had to transform the independent variable (+0.2) so that its is >= 0, because of convergence problems.

require(drc)
# four-parameter Weibull with fixed parameters for the asymptote, added +0.2 to IV to overcome convergence problems
mod <- drm(dependent_variable ~ I(independent_variable+0.2), 
           data = dframe1, 
           fct = W1.4(fixed = c(NA, 0.5, 1, NA)))

# predicts
df2 <- data.frame(pred = predict(mod, newdata = data.frame(idenpendent_variable = seq(0, 0.5, length.out=100))), 
                  x = seq(0, 0.5, length.out=100))

ggplot() + 
  geom_point(data = dframe1, aes(x = independent_variable + 0.2, y = dependent_variable)) +
  geom_line(data = df2, aes(x = x, y = pred))

However I agree with Ben Bolker that other models may be better suited.

I only know these kind of models from ecotoxicology (dose-response-models, where one is interested in the concentration where we have 50% mortality [=EC50]).

enter image description here

Update A four-parameter log-logistic model fits also quite good (smaller AIC and RSE then weibull): Again I fixed here the asymptote parameter and transformed the IV.

# four-parameter log-logistic with fixed parameters for the asymptote, added +0.2 to IV to overcome convergence problems
mod1 <- drm(dependent_variable ~ I(independent_variable+0.2), 
           data = dframe1, 
           fct = LL2.4(fixed=c(NA, 0.5, 1, NA)))
summary(mod1)

# predicts
df2 <- data.frame(pred = predict(mod1, newdata = data.frame(idenpendent_variable = seq(0, 0.5, length.out=100))), 
                  x = seq(0, 0.5, length.out=100))

ggplot() + 
  geom_point(data = dframe1, aes(x = independent_variable + 0.2, y = dependent_variable)) +
  geom_line(data = df2, aes(x = x, y = pred))

enter image description here

Airtoair answered 19/2, 2013 at 21:41 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags