Saving statmodels Tukey hsd into a Python pandas dataframe
Asked Answered
O

2

9

I am looking for a way to save the results to save the results of the Tukeyhsd into a pandas dataframe. see below:

import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
import statsmodels.stats.multicomp as multi 

 mcDate = multi.MultiComparison(df['Glucose'], df['Date'])
 Results = mcDate.tukeyhsd()
  print(Results)

    Multiple Comparison of Means - Tukey HSD,FWER=0.05
=============================================
group1 group2 meandiff  lower   upper  reject
---------------------------------------------
  A      B     20.35    7.388   33.312  True 
  A      C     -3.85   -16.812  9.112  False 
  B      C     -24.2   -37.162 -11.238  True 
---------------------------------------------
Outfitter answered 9/11, 2016 at 22:6 Comment(0)
L
18

I do not have access to your data, so I can't replicate the result. I used randomised data instead, just to show that this works. All you need to add to your code is the pandas import, and the last line creating the data frame.

import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
import statsmodels.stats.multicomp as multi
import pandas as pd
import numpy as np

# Random Data.
np.random.seed(0)
x = np.random.choice(['A','B','C'], 50)
y = np.random.rand(50)

# DataFrame.
mcDate = multi.MultiComparison(y,x)
Results = mcDate.tukeyhsd()
print(Results)

Produces the following table:

============================================
group1 group2 meandiff  lower  upper  reject
--------------------------------------------
  A      B     0.1506   -0.07  0.3712 False 
  A      C     0.1105  -0.1278 0.3487 False 
  B      C    -0.0401  -0.2865 0.2063 False 
--------------------------------------------

And, this is how you get the data frame:

df = pd.DataFrame(data=Results._results_table.data[1:], columns=Results._results_table.data[0])

print(df)

group1 group2  meandiff   lower   upper  reject
0      A      B    0.1506 -0.0700  0.3712   False
1      A      C    0.1105 -0.1278  0.3487   False
2      B      C   -0.0401 -0.2865  0.2063   False

I struggled with this for a while myself, and eventually found the solution by reviewing methods for the object, like this:

dir(Results)
Lunula answered 28/7, 2017 at 9:21 Comment(2)
This doesnt work for me. I get an error 'SimpleTable' object has no attribute '_results_table' Is there a way to circumvent this?Roumell
In statsmodels version 0.14 tukey_df = pd.DataFrame(data=Results.summary().data[1:], columns=Results.summary().data[0])Laius
G
2

As an update to @vander's answer, and to address @thentangler's comment, in statsmodels 0.12.1 the table's data is accessible as Results.data rather than Results._results_table.data.

The conversion of Results to a dataframe then becomes:
df = pd.DataFrame(data=Results.data[1:], columns=Results.data[0]).

Gummous answered 8/10, 2021 at 12:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.