[I'm new to R...] I have this dataframe:
df1 <- data.frame(c(2,1,2), c(1,2,3,4,5,6), seq(141,170)) #create data.frame
names(df1) <- c('gender', 'age', 'height') #column names
I want the df1
's summary in a dataframe object that looks like this:
count mean std min 25% 50% 75% max
age 30.0000 3.5000 1.7370 1.0000 2.0000 3.5000 5.0000 6.0000
gender 30.0000 1.6667 0.4795 1.0000 1.0000 2.0000 2.0000 2.0000
height 30.0000 155.5000 8.8034 141.0000 148.2500 155.5000 162.7500 170.0000
I've generated this in Python with df1.describe().T
. How can I do this in R?
It would be a gratis if my summary dataframe would contain the "dtype", "null" (number of NULL
values), (number of) "unique" and "range" values as well to have a comprehensive summary statistics:
count mean std min 25% 50% 75% max null unique range dtype
age 30.0000 3.5000 1.7370 1.0000 2.0000 3.5000 5.0000 6.0000 0 6 5 int64
gender 30.0000 1.6667 0.4795 1.0000 1.0000 2.0000 2.0000 2.0000 0 2 1 int64
height 30.0000 155.5000 8.8034 141.0000 148.2500 155.5000 162.7500 170.0000 0 30 29 int64
The Python code of above result is:
df1.describe().T.join(pd.DataFrame(df1.isnull().sum(), columns=['null']))\
.join(pd.DataFrame.from_dict({i:df1[i].nunique() for i in df1.columns}, orient='index')\
.rename(columns={0:'unique'}))\
.join(pd.DataFrame.from_dict({i:(df1[i].max() - df1[i].min()) for i in df1.columns}, orient='index')\
.rename(columns={0:'range'}))\
.join(pd.DataFrame(df1.dtypes, columns=['dtype']))
Thank you!