texreg
package author here. texreg
is based on generic functions, which means any user can add custom extract
methods for arbitrary models and make them work with texreg
. I will walk you through this below. I am hoping that this detailed exposition will help other people who have asked similar questions in the past to devise their own texreg
extensions.
See also Section 6 in the 2013 paper in the Journal of Statistical Software for another example. First, however, I will describe how the texreg
architecture works more generally to give you an idea what is possible.
texreg
and the generic extract
function
There are three functions: texreg
(for LaTeX output), htmlreg
(for HTML output, which can also be interpreted by Word or Markdown in most usage scenarios), and screenreg
(for ASCII text output in the console). These three functions serve to convert some cleaned-up information (coefficients, standard errors, confidence intervals, p-values, goodness-of-fit statistics, model labels etc.) into the respective output formats. This is not perfect and could be even more flexible, but I think it has quite a few arguments for customization at this point, including things like booktabs
and dcolumn
support. So the big challenge is rather to get cleaned-up model information in the first place.
This is done by providing a texreg
object to any of these three functions. A texreg
object is just a container for coefficients etc. and is formally defined using an S4 class. To create a texreg
object, you can either use the constructor function createTexreg
(as documented on the help pages), which accepts all the different pieces of information as arguments, such as the standard errors etc. Or (better) you can use the extract
function to extract those pieces of information from some estimated model and return a texreg
object for use with any of the three functions. The way you typically do this is by just handing over a list of several models to the texreg
or screenreg
etc. function, and this function will internally call extract
to create texreg
objects and then process the information from those objects.
However, it is equally valid to just save the output of a call of the extract
function to an object, possibly manipulate this texreg
object, and then call the texreg
function on the manipulated object to display it as a table. This permits some flexibility in tweaking the results.
Earlier on, I mentioned that the package uses generic functions. This means that the extract
function is generic in the sense that you can register methods for it that work with arbitrary classes of models. For example, if the extract
function does not know how to handle h2o
objects and how to extract the relevant information from such an object, you can just write a method to do that and register it with the extract
function. Below, I will walk you through this step by step in the hope that people will learn from this detailed exposition and start writing their own extensions. (Note: if somebody develops a useful method, please e-mail me and I can include it in the next texreg
release.) It is also worth pointing out that the source files of the package contain more than 70 examples of extract
methods which you can use as templates. These examples are stored in the file R/extract.R
.
Identifying the class label and setting up an extract
method
The first step is to identify the class name of the object. In your example, class(model.output.1)
returns the following class labels: "H2OBinomialModel" and "h2o". The first label is the more specific one, and the second label is the more general one. If all h2o
model objects are structured in a similar way, it will make sense to write an extension for h2o
objects and then decide within the method how to proceed with the specific model. As I know virtually nothing about the h2o
package, I prefer to start out with a more specific extract
method for H2OBinomialModel
objects in this case. It can be adjusted later on should we wish to do so.
extract
methods are structured as follows: you write a function called extract.xyz
(replace "xyz" by the class label), have at least one argument called model
, which accepts the model object (e.g., model.output.1
in your example), put some code in the body that extracts the relevant information from the model
object, create a texreg
object using the createTexreg
constructor, and return this object. Here is an empty container:
extract.H2OBinomialModel <- function(model, ...) {
s <- summary(model)
# extract information from model and summary object here
# then create and return a texreg object (replace NULL with actual values):
tr <- createTexreg(
coef.names = NULL, # character vector of coefficient labels
coef = NULL, # numeric vector with coefficients
se = NULL, # numeric vector with standard error values
pvalues = NULL, # numeric vector with p-values
gof.names = NULL, # character vector with goodness-of-fit labels
gof = NULL, # numeric vector of goodness-of-fit statistics
gof.decimal = NULL # logical vector: GOF statistic has decimal points?
)
return(tr)
}
Note that the function definition also contains the ...
argument, which can be used for custom arguments that should be handed over to function calls within the extract
method.
Note also that the first line in the body of the function definition saves the summary of the model in an object called s
. This is often useful because many package writers decide to store some of the information in a simpler version in the summary, so one should usually consider both, the model and its summary, as useful sources of information. In some cases, one may have to look at the actual definition of the summary method in the respective package to find out how the pieces of information displayed on the summary page are computed when the summary
command is called because not all summary
methods store the different elements displayed in the summary
object.
Locating the right information in an H2OBinomialModel
object
The next step is to examine the object and locate all the details that should be displayed in the final table. By looking at the output of model.output.1
, I would guess that the following part should constitute the GOF block at the bottom of the table:
MSE: 0.202947
R^2: 0.1562137
LogLoss: 0.5920097
Mean Per-Class Error: 0.3612191
AUC: 0.7185655
Gini: 0.4371311
Null Deviance: 512.2888
Residual Deviance: 449.9274
AIC: 457.9274
And the following part should probably constitute the coefficient block in the middle of the table:
Coefficients: glm coefficients
names coefficients standardized_coefficients
1 Intercept -1.835223 -0.336428
2 RACE -0.625222 -0.193052
3 DCAPS 1.314428 0.408336
4 PSA 0.046861 0.937107
In many cases, the summary contains the relevant information, but here printing the model yields what we need. We will need to locate all of this in the model.output.1
object (or its summary if applicable). To do that, there are several useful commands. Among them are str(model.output.1)
, names(summary(model.output.1))
, and similar commands.
Let's start with the coefficient block. Calling str(model)
reveals that there is a slot called model
in the S4 object. We can look at its contents by calling model.output.1@model
. The result is a list with several named elements, among them coefficients_table
. So we can access the coefficient table by calling model.output.1@model$coefficients_table
. The result is a data frame the columns of which we can access using the $
operator. In particular, we need the names and the coefficients. There are two types of coefficients here, standardized and unstandardized, and we can add an argument to our extract method later to let the user decide what he or she wants. Here is how we extract the coefficients and their labels:
coefnames <- model.output.1@model$coefficients_table$names
coefs <- model.output.1@model$coefficients_table$coefficients
coefs.std <- model.output.1@model$coefficients_table$standardized_coefficients
As far as I can see, there are no standard errors or p-values stored in the object. We could write some additional code to compute them should we wish to do that, but here we will focus on things that are readily provided as part of the model output.
It is important that we should not overwrite any existing function names in R
, such as names
or coef
. While doing this should technically work because the code is executed within a function later, this can easily lead to confusion while trying things out, so you should better avoid that.
Next, we need to locate the goodness-of-fit statistics. By examining the output of str(model.output.1)
carefully, we see that the goodness-of-fit statistics are contained in several slots under model@model$training_metrics@metrics
. Let's save them to some objects that are easier to access:
mse <- model.output.1@model$training_metrics@metrics$MSE
r2 <- model.output.1@model$training_metrics@metrics$r2
logloss <- model.output.1@model$training_metrics@metrics$logloss
mpce <- model.output.1@model$training_metrics@metrics$mean_per_class_error
auc <- model.output.1@model$training_metrics@metrics$AUC
gini <- model.output.1@model$training_metrics@metrics$Gini
nulldev <- model.output.1@model$training_metrics@metrics$null_deviance
resdev <- model.output.1@model$training_metrics@metrics$residual_deviance
aic <- model.output.1@model$training_metrics@metrics$AIC
In some cases, but not here, the author of a package writes methods for generic functions that can be used to extract some common information like the number of observations (nobs(model)
), AIC (AIC(model)
), BIC (BIC(model)
), deviance (deviance(model)
), or the log likelihood (logLik(model)[[1]]
). So these are things you may want to try first; but the h2o
package does not seem to offer such convenience methods.
Adding the information to the extract.H2OBinomialModel
function
Now that we have located all pieces of information we need, we can add them to the extract.H2OBinomialModel
function we defined above.
However, we would like to let the user decide if he or she prefers raw or standardized coefficients, and we would like to let the user decide which goodness-of-fit statistics should be reported, so we add various logical arguments to the function header and then use if-conditions inside the function to check whether we should embed the respective statistics in the resulting texreg
object.
We also remove the line s <- summary(model)
in this case because we don't actually need any kind of information from the summary since we found everything we need in the model object.
The complete function may look like this:
# extension for H2OBinomialModel objects (h2o package)
extract.H2OBinomialModel <- function(model, standardized = FALSE,
include.mse = TRUE, include.rsquared = TRUE, include.logloss = TRUE,
include.meanerror = TRUE, include.auc = TRUE, include.gini = TRUE,
include.deviance = TRUE, include.aic = TRUE, ...) {
# extract coefficient table from model:
coefnames <- model@model$coefficients_table$names
if (standardized == TRUE) {
coefs <- model@model$coefficients_table$standardized_coefficients
} else {
coefs <- model@model$coefficients_table$coefficients
}
# create empty GOF vectors and subsequently add GOF statistics from model:
gof <- numeric()
gof.names <- character()
gof.decimal <- logical()
if (include.mse == TRUE) {
mse <- model@model$training_metrics@metrics$MSE
gof <- c(gof, mse)
gof.names <- c(gof.names, "MSE")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.rsquared == TRUE) {
r2 <- model@model$training_metrics@metrics$r2
gof <- c(gof, r2)
gof.names <- c(gof.names, "R^2")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.logloss == TRUE) {
logloss <- model@model$training_metrics@metrics$logloss
gof <- c(gof, logloss)
gof.names <- c(gof.names, "LogLoss")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.meanerror == TRUE) {
mpce <- model@model$training_metrics@metrics$mean_per_class_error
gof <- c(gof, mpce)
gof.names <- c(gof.names, "Mean Per-Class Error")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.auc == TRUE) {
auc <- model@model$training_metrics@metrics$AUC
gof <- c(gof, auc)
gof.names <- c(gof.names, "AUC")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.gini == TRUE) {
gini <- model@model$training_metrics@metrics$Gini
gof <- c(gof, gini)
gof.names <- c(gof.names, "Gini")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.deviance == TRUE) {
nulldev <- model@model$training_metrics@metrics$null_deviance
resdev <- model@model$training_metrics@metrics$residual_deviance
gof <- c(gof, nulldev, resdev)
gof.names <- c(gof.names, "Null Deviance", "Residual Deviance")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
if (include.aic == TRUE) {
aic <- model@model$training_metrics@metrics$AIC
gof <- c(gof, aic)
gof.names <- c(gof.names, "AIC")
gof.decimal <- c(gof.decimal, TRUE)
}
# create texreg object:
tr <- createTexreg(
coef.names = coefnames,
coef = coefs,
gof.names = gof.names,
gof = gof,
gof.decimal = gof.decimal
)
return(tr)
}
For the goodness-of-fit block, you can see that I first created empty vectors and then subsequently populated them with additional statistics provided that the respective statistic was switched on by the user using the respective argument.
The gof.decimal
logical vector indicates for each GOF statistic whether it has decimal places (TRUE
) or not (FALSE
, as in the number of observations, for example).
Finally, the new function needs to be registered as a method for the generic extract
function. This is done using a simple command:
setMethod("extract", signature = className("H2OBinomialModel", "h2o"),
definition = extract.H2OBinomialModel)
Here, the first argument of className
is the class label, and the second one is the package in which the class is defined.
Summing up, the only two things that need to be done in order to write a custom extension are 1) writing an extract method, and 2) registering the method. That is, this code can be executed at runtime and doesn't have to be inserted into any package.
However, for your convenience, I have added the H2OBinomialModel
method to texreg
version 1.36.13, which is available on CRAN.
Note that the solution presented here does not work with any previous version of texreg
because previous versions could not deal with models that had neither standard errors nor confidence intervals. This is a fairly specialized setup in my opinion, and I hadn't come across a package that merely provides the estimates without any uncertainty measures. I have now fixed this in texreg
.
Trying out the new extract
method
Once the function definition and the setMethod
command have been executed at runtime, the following command can be used to create a table:
screenreg(list(model.output.1, model.output.2), custom.note = "")
This is the output:
======================================
Model 1 Model 2
--------------------------------------
Intercept -1.84 -1.11
RACE -0.63 -0.62
DCAPS 1.31 1.31
PSA 0.05 0.05
AGE -0.01
--------------------------------------
MSE 0.20 0.20
R^2 0.16 0.16
LogLoss 0.59 0.59
Mean Per-Class Error 0.36 0.38
AUC 0.72 0.72
Gini 0.44 0.44
Null Deviance 512.29 512.29
Residual Deviance 449.93 449.51
AIC 457.93 459.51
======================================
The custom.note = ""
argument makes sense here because we don't want a significance legend since the models do not report any uncertainty measures.
It is also possible to suppress some of the GOF measures and/or use standardized coefficients:
screenreg(list(model.output.1, model.output.2), custom.note = "",
include.deviance = FALSE, include.auc = FALSE, standardized = TRUE)
The result:
======================================
Model 1 Model 2
--------------------------------------
Intercept -0.34 -0.34
RACE -0.19 -0.19
DCAPS 0.41 0.41
PSA 0.94 0.94
AGE -0.07
--------------------------------------
MSE 0.20 0.20
R^2 0.16 0.16
LogLoss 0.59 0.59
Mean Per-Class Error 0.36 0.38
Gini 0.44 0.44
AIC 457.93 459.51
======================================
Other slots that can be used with createTexreg
The createTexreg
constructor function is called within any extract
method. The example above shows some simple pieces of information. Besides the details contained in the example, the following slots are available in texreg
objects:
- se: standard errors
- pvalues: the p-values
- ci.low: lower bound of a confidence interval
- ci.up: upper bound of a confidence interval
- model.name: the title of the current model
Note that either confidence intervals or standard errors and p-values should be used, not both.
Manipulating texreg
objects
Besides handing over the models to the screenreg
, texreg
, or htmlreg
functions directly, it is possible to save the extracted information to a texreg
object first and manipulate it before the table is displayed or saved. The added value is that even complex changes to a table are easy to apply this way. For example, it is possible to rename coefficients or GOF statistics, add new rows, rename a model, modify the values, or change the order of coefficients or GOF statistics. Here is how you can do that: First, you call the extract
function to save the information to a texreg
object:
tr <- extract(model.output.1)
You can display the contents of the texreg
object by just calling tr
, which shows the following output:
No standard errors and p-values were defined for this texreg object.
coef.
Intercept -1.83522343
RACE -0.62522179
DCAPS 1.31442834
PSA 0.04686106
GOF dec. places
MSE 0.2029470 TRUE
R^2 0.1562137 TRUE
LogLoss 0.5920097 TRUE
Mean Per-Class Error 0.3612191 TRUE
AUC 0.7185655 TRUE
Gini 0.4371311 TRUE
Null Deviance 512.2888402 TRUE
Residual Deviance 449.9273825 TRUE
AIC 457.9273825 TRUE
Alternatively, this is the structure of the object as shown by str(tr)
:
Formal class 'texreg' [package "texreg"] with 10 slots
..@ coef.names : chr [1:4] "Intercept" "RACE" "DCAPS" "PSA"
..@ coef : num [1:4] -1.8352 -0.6252 1.3144 0.0469
..@ se : num(0)
..@ pvalues : num(0)
..@ ci.low : num(0)
..@ ci.up : num(0)
..@ gof.names : chr [1:9] "MSE" "R^2" "LogLoss" "Mean Per-Class Error" ...
..@ gof : num [1:9] 0.203 0.156 0.592 0.361 0.719 ...
..@ gof.decimal: logi [1:9] TRUE TRUE TRUE TRUE TRUE TRUE ...
..@ model.name : chr(0)
Now you can just manipulate this object in arbitrary ways, e.g., add a GOF statistic:
[email protected] <- c([email protected], "new statistic")
tr@gof <- c(tr@gof, 12)
[email protected] <- c([email protected], FALSE)
Or you can change the order of coefficients:
[email protected] <- [email protected][c(4, 1, 2, 3)]
tr@coef <- tr@coef[c(4, 1, 2, 3)]
When you are done with the manipulations, you can hand over the texreg
object instead of the original model when you call, for example, screenreg
:
screenreg(list(tr, model.output.2), custom.note = "")
The new result will look like this:
======================================
Model 1 Model 2
--------------------------------------
PSA 0.05 0.05
Intercept -1.84 -1.11
RACE -0.63 -0.62
DCAPS 1.31 1.31
AGE -0.01
--------------------------------------
MSE 0.20 0.20
R^2 0.16 0.16
LogLoss 0.59 0.59
Mean Per-Class Error 0.36 0.38
AUC 0.72 0.72
Gini 0.44 0.44
Null Deviance 512.29 512.29
Residual Deviance 449.93 449.51
AIC 457.93 459.51
new statistic 12
======================================
TL;DR
texreg
can be customized by users. Just write an extract method like the one shown above and register it using a setMethods
call. I have included the H2OBinomialModel
method in the latest texreg
version 1.36.13, along with a bugfix for using models without standard errors (such as this one).
broom
to compare models side by side? – Newspaperman