Format model display in texreg or stargazer R as scientific
Asked Answered
O

3

9

I just ran a statisitical model and i want it to display the results of the model as a table using stargazer. However, the large numbers are displayed in full.

fit2<-lm(A~B,data=C)
stargazer(fit2,type="text")

With this table as result

===================================================
                      Dependent variable:      
                -------------------------------
                               A               
---------------------------------------------------
B                               -0.599             
                                (1.698)            
                          32,126,391.000         
                         (24,004,268.000)        

---------------------------------------------------
 Observations                       5               
R2                               0.040             
Adjusted R2                     -0.280             
Residual Std. Error   31,217,258.000 (df = 3e+00)  
F Statistic            0.124 (df = 1e+00; 3e+00)   
===================================================
Note:               *p<1e-01; **p<5e-02; ***p<1e-02

How do I get the large numbers displayed as scientific i.e: 3.12e+07, please? I have tried:

options("scipen"=-20,"digit"=2)
fit1<-format(lm(A~B,data=C),scientific=T)

This however causes the summary of the model to be distortrd and displayed as a single row. What are the best ways to format the numbers and retain the table structure, please?

                   CO          NO2        SM
Dec 2004 2.750000e+18 1.985136e+15 0.2187433
Jan 2005 2.980000e+18 2.144211e+15 0.1855678
Feb 2005 2.810000e+18 1.586491e+15 0.1764805
Dec 2005 3.010000e+18 1.755409e+15 0.2307153
Jan 2006 3.370000e+18 2.205888e+15 0.2046671
Feb 2006 3.140000e+18 2.084682e+15 0.1834232
Dec 2006 2.940000e+18 1.824735e+15 0.1837391
Jan 2007 3.200000e+18 2.075785e+15 0.1350665
Feb 2007 3.060000e+18 1.786481e+15 0.1179924
Dec 2007 2.750000e+18 1.645800e+15 0.2037340
Jan 2008 3.030000e+18 1.973517e+15 0.1515871
Feb 2008 3.040000e+18 1.753803e+15 0.1289968
Dec 2008 2.800000e+18 1.649315e+15 0.1968024
Jan 2009 3.090000e+18 1.856762e+15 0.1630173
Feb 2009 2.880000e+18 1.610011e+15 0.1446938
Dec 2009 2.660000e+18 1.562971e+15 0.1986012
Jan 2010 2.864333e+18 1.733843e+15 0.1559205
Feb 2010 2.881474e+18 1.469982e+15 0.1397536
Dec 2010 2.730000e+18 1.652751e+15 0.2129476
Jan 2011 3.030000e+18 1.862774e+15 0.1681295
Feb 2011 2.850000e+18 1.658988e+15 0.1531579
Ot answered 5/3, 2015 at 8:55 Comment(2)
Here is a sample of the actual data:Ot
I can't believe there's no way to get stargazer to do scientific notation for regressions!Flagitious
D
3

To do this, you can write your own function to take the large numbers and put them into scientific notation.

First, load the stargazer package:

library(stargazer)

Then, create data with large numbers for the example:

set.seed(1)

C <- data.frame("A" = rnorm(10000, 30000, 10000),
                "B" = rnorm(10000, 7500, 2500))

Fit the model and store the stargazer results table in an object:

fit2 <- lm(A ~ B, data = C) 

myResults <- stargazer(fit2, type = "text")

Create a function to take a stargazer table and convert large numbers into scientific notation. (This is not very flexible but can be with simple modifications. Right now only works for 1,000 - 99,999)

fixNumbers <- function(stargazer.object){

  so <- stargazer.object
  rows <- grep(".*[\\d+],[\\d+].*", so, perl = T)
  for(row in rows){

    # Get number and format into scientific notation
    number <- as.numeric(sub(".*([0-9]{1,2}),([0-9]+\\.?[0-9]*).*", "\\1\\2", so[row], perl = T))
    formatted_num <- sprintf("%.2e", number)
    so[row] <- sub("(.*)[0-9]{1,2},[0-9]+\\.?[0-9]*(.*)", paste0("\\1", formatted_num, "\\2"), so[row], perl = T)
  }

  # Print result
  for(i in 1:length(so)){
    cat(so[i], "\n")
  }
}

Give the new function (fixNumbers) your stargazer object:

fixNumbers(myResults)

-- Here's all the code in one chunk: --

library(stargazer)

set.seed(1)

C <- data.frame("A" = rnorm(10000, 30000, 10000),
                "B" = rnorm(10000, 7500, 2500))

fit2 <- lm(A ~ B, data = C) 

myResults <- stargazer(fit2, type = "text")

fixNumbers <- function(stargazer.object){

  so <- stargazer.object
  rows <- grep(".*[\\d+],[\\d+].*", so, perl = T)
  for(row in rows){

    # Get number and format into scientific notation
    number <- as.numeric(sub(".*([0-9]{1,2}),([0-9]+\\.?[0-9]*).*", "\\1\\2", so[row], perl = T))
    formatted_num <- sprintf("%.2e", number)
    so[row] <- sub("(.*)[0-9]{1,2},[0-9]+\\.?[0-9]*(.*)", paste0("\\1", formatted_num, "\\2"), so[row], perl = T)
  }

  # Print result
  for(i in 1:length(so)){
    cat(so[i], "\n")
  }
}

fixNumbers(myResults)
Detradetract answered 3/7, 2019 at 4:49 Comment(0)
L
3

Following Adam K idea, but with a bit more of optimized regex (and making use of vectorisation, which is good idea in R):

fit2<-lm(CO~NO2,data=df)
test <- stargazer(fit2,type="text",)

It is a two line regex: you need to find the number (here of more than five numbers), that are string with number, comma and points

m <- gregexpr("([0-9\\.,]{5,})", test)

you need to apply a transformation function to that (here supress the comma, make a number, and display it in scientific with 2 digits. You can consider also the formatC which gives a lot of possibility):

f = function(x){
  sprintf("%.2e",as.numeric( gsub(",","",x)))
}

and you apply it to your regex using the regmatches function

regmatches(test, m) <- lapply(regmatches(test, m), f)
test


 [1] ""                                                           
 [2] "========================================================"   
 [3] "                            Dependent variable:         "   
 [4] "                    ------------------------------------"   
 [5] "                                     CO                 "   
 [6] "--------------------------------------------------------"   
 [7] "NO2                              6.26e+02**              "  
 [8] "                                 (2.41e+02)              "  
 [9] "                                                        "   
[10] "Constant              1.81e+18***  "                        
[11] "                       (4.62e+17)    "                      
[12] "                                                        "   
[13] "--------------------------------------------------------"   
[14] "Observations                         10                 "   
[15] "R2                                 4.58e-01                "
[16] "Adjusted R2                        3.90e-01                "
[17] "Residual Std. Error 1.57e+17 (df = 8)"                      
[18] "F Statistic                 6.76e+00** (df = 1; 8)         "
[19] "========================================================"   
[20] "Note:                        *p<0.1; **p<0.05; ***p<0.01"   

To otbain the same output as the original:

print(as.data.frame(test),quote = F,row.names = FALSE)



                                                       test

    ========================================================
                                Dependent variable:         
                        ------------------------------------
                                         CO                 
    --------------------------------------------------------
   NO2                              6.26e+02**              
                                    (2.41e+02)              

                         Constant              1.81e+18***  
                                              (4.62e+17)    

    --------------------------------------------------------
    Observations                         10                 
 R2                                 4.58e-01                
 Adjusted R2                        3.90e-01                
                       Residual Std. Error 1.57e+17 (df = 8)
 F Statistic                 6.76e+00** (df = 1; 8)         
    ========================================================
    Note:                        *p<0.1; **p<0.05; ***p<0.01

the data:

df <- read.table(text  = "
CO NO2 SM
 2.750000e+18 1.985136e+15 0.2187433
 2.980000e+18 2.144211e+15 0.1855678
 2.810000e+18 1.586491e+15 0.1764805
 3.010000e+18 1.755409e+15 0.2307153
 3.370000e+18 2.205888e+15 0.2046671
 3.140000e+18 2.084682e+15 0.1834232
 2.940000e+18 1.824735e+15 0.1837391
 3.200000e+18 2.075785e+15 0.1350665
 3.060000e+18 1.786481e+15 0.1179924
 2.750000e+18 1.645800e+15 0.2037340",header = T)
Lorusso answered 8/7, 2019 at 8:42 Comment(0)
E
-4

The problem is not that these packages cannot display scientific notation. The problem is rather that your independent variables are on an extremely small scale. You should rescale them before you use them in your model by multiplying the values by some constant. For example, when you deal with the size of persons in kilometers, you may want to rescale them to meters or centimeters. This would make the table much easier to read than displaying the results in scientific notation.

Consider the following example:

a <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14)
b <- c(0.00020, 0.00024, 0.00024, 0.00026, 0.00021, 0.00022, 0.00023, 
    0.00022, 0.00023, 0.00022)
model.1 <- lm(a ~ b)

Next, create your table with texreg:

library("texreg")
screenreg(model.1)

This yields the following table:

=========================
             Model 1     
-------------------------
(Intercept)     -2.27 *  
                (0.94)   
b            32168.58 ***
             (4147.00)   
-------------------------
R^2              0.88    
Adj. R^2         0.87    
Num. obs.       10       
=========================
*** p < 0.001, ** p < 0.01, * p < 0.05

So the coefficients are pretty large. Let's try the same thing with stargazer:

library("stargazer")
stargazer(model.1, type = "text")

The resulting table:

===============================================
                        Dependent variable:    
                    ---------------------------
                                 a             
-----------------------------------------------
b                          32,168.580***       
                            (4,146.999)        

Constant                     -2.270**          
                              (0.944)          

-----------------------------------------------
Observations                    10             
R2                             0.883           
Adjusted R2                    0.868           
Residual Std. Error       0.212 (df = 8)       
F Statistic            60.172*** (df = 1; 8)   
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

Same problem: large coefficients. Now rescale your original variable b and recompute the model:

b <- b * 10000
model.2 <- lm(a ~ b)

Try it again with texreg:

screenreg(model.2)

======================
             Model 1  
----------------------
(Intercept)  -2.27 *  
             (0.94)   
b             3.22 ***
             (0.41)   
----------------------
R^2           0.88    
Adj. R^2      0.87    
Num. obs.    10       
======================
*** p < 0.001, ** p < 0.01, * p < 0.05

And with stargazer:

stargazer(model.2, type = "text")

===============================================
                        Dependent variable:    
                    ---------------------------
                                 a             
-----------------------------------------------
b                            3.217***          
                              (0.415)          

Constant                     -2.270**          
                              (0.944)          

-----------------------------------------------
Observations                    10             
R2                             0.883           
Adjusted R2                    0.868           
Residual Std. Error       0.212 (df = 8)       
F Statistic            60.172*** (df = 1; 8)   
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

Now the coefficients look nicer and you do not need scientific notation.

Eyelid answered 6/3, 2015 at 12:17 Comment(4)
Hi spammerslammer, thanks for your answer. However the actual two variables I am working with are atmospheric data which are in multiples of 10x18 and 10x15. Is there a way to round this up in latex such that it could be displayed as: 4x10 raised to the power of 15 instead of the full length, please?Ot
So what prevents you from rescaling them? I mean why is it not feasible to divide them by 10x18 and change the interpretation accordingly? (Out of curiosity: what is the unit of measurement?)Eyelid
its in molecules cm-2.Ot
So how about expressing this in molecules per square angstrom, which would be your value * 10^16, I think (but not sure).Eyelid

© 2022 - 2024 — McMap. All rights reserved.