R :How to get a proper latex regression table from a dataframe?
Asked Answered
U

1

3

Consider the following example

inds <- c('var1','','var2','')
model1 <- c(10.2,0.00,0.02,0.3)
model2 <- c(11.2,0.01,0.02,0.023)

df = df=data.frame(inds,model1,model2)
df
 inds model1 model2
 var1  10.20 11.200
        0.00  0.010
 var2   0.02  0.020
        0.30  0.023

Here you have the output of a custom regression model with coefficients and P-values (I actually can show any other statistics if I need to, say, the standard errors of the coefficients).

There are two variables, var1 and var2.

For instance, in model1, var1 comes with a coefficient of 10.2 and a P-value of 0.00 while var2 has a coefficient of 0.02 and a P-value of 0.30.

Is there a package that handle these (custom) tables automatically and can create a neat Latex table with stars for significance?

Thanks!

Unpin answered 21/10, 2016 at 12:23 Comment(4)
Up front: Don’t use Stargazer. It’s terrible. And it cannot be customised at all.Jag
thanks @KonradRudolph. Any suggestion is greatly appreciated.Fariss
As for suggestions, I use the package ‹pander› in combination with ‹knitr› for all my table typesetting needs. However, I manually post-process the result to produce tables that follow the convention laid out in the ‹booktabs› documentation. No easy to use solution, I’m afraid.Jag
cant we use your pander solution in that case? would the code be hard to write?Fariss
N
5

Here is a solution using texreg.

Note that texreg >= 1.36.18 is required.

The information you are providing in the data frame (coefs and p-values) could be arranged in arbitrary ways in a data frame. Therefore we need to write code that selects these data from the appropriate places in the data frame and uses them to create a texreg object. As you are requesting a generic (and presumably re-usable) solution, we should wrap the code in a re-usable function. I'll call this function extractFromDataFrame. So here is the function, which extracts the information from the data frame and creates a list of texreg objects for the different models:

require("texreg")

extractFromDataFrame <- function (dataFrame) {
  coef.row.indices <- seq(1, nrow(dataFrame) - 1, 2)
  pval.row.indices <- seq(2, nrow(dataFrame), 2)
  texregObjects <- list()
  for (i in 2:ncol(dataFrame)) {
    coefs <- dataFrame[coef.row.indices, i]
    coefnames <- as.character(dataFrame[coef.row.indices, 1])
    pvalues <- dataFrame[pval.row.indices, i]
    tr <- createTexreg(coef = coefs, coef.names = coefnames, pvalues = pvalues)
    texregObjects[i - 1] <- list(tr)
  }
  return(texregObjects)
}

In this function, we first define in which rows of the data frame the coefficients are stored and in which rows the p-values are stored. Then we created an empty list in which we stored the texreg objects. We iterate through all columns but the first as the first one contains only the labels. In each of these model columns, we save the coefficients, their names, and the p-values, and then we hand them over to the createTexreg constructor, which is a function that creates a texreg object for us based on the data. We add the texreg object to the list. In the end, we return the list of texreg objects.

We can now apply the function to any data frame that looks like the one provided in the question, with arbitrary numbers of columns (> 1). In this case, after applying the function to the df object, we may want to print the contents of the list if we want to make sure that we did everything right:

tr <- extractFromDataFrame(df)
tr

And indeed, the results contain the relevant data:

[[1]]

No standard errors were defined for this texreg object.
No decimal places were defined for the GOF statistics.

     coef.   p
var1 10.20 0.0
var2  0.02 0.3

No GOF block defined.

[[2]]

No standard errors were defined for this texreg object.
No decimal places were defined for the GOF statistics.

     coef.     p
var1 11.20 0.010
var2  0.02 0.023

No GOF block defined.

Now we can simply hand the list of texreg objects over to screenreg, e.g., screenreg(tr), with the following result:

========================
      Model 1    Model 2
------------------------
var1  10.20 ***  11.20 *
var2   0.02       0.02 *
========================
*** p < 0.001, ** p < 0.01, * p < 0.05

Or to htmlreg for creating an HTML table. Or, as requested in the original question, to texreg for creating a LaTeX table. The output of texreg(tr, single.row = TRUE) looks like this:

\begin{table}
\begin{center}
\begin{tabular}{l c c }
\hline
 & Model 1 & Model 2 \\
\hline
var1 & $10.20^{***}$ & $11.20^{*}$ \\
var2 & $0.02$        & $0.02^{*}$  \\
\hline
\multicolumn{3}{l}{\scriptsize{$^{***}p<0.001$, $^{**}p<0.01$, $^*p<0.05$}}
\end{tabular}
\caption{Statistical models}
\label{table:coefficients}
\end{center}
\end{table}

This solution can be modified to accommodate standard errors, confidence intervals, or goodness-of-fit statistics.

Various texreg arguments can be used to customize the output, including the use of the booktabs package or decimal alignment via dcolumn, for example.

Please note that you should not call your data frame df because that object name is already defined in the stats package.

Nethermost answered 22/10, 2016 at 23:15 Comment(1)
what can I say? Amazing!!Fariss

© 2022 - 2024 — McMap. All rights reserved.