For a certain Kaggle dataset (rules prohibit me from sharing the data here, but is readily accessible here),
import pandas
df_train = pandas.read_csv(
"01 - Data/act_train.csv.zip"
)
df_train.describe()
I get:
>>> df_train.describe()
outcome
count 2.197291e+06
mean 4.439544e-01
std 4.968491e-01
min 0.000000e+00
25% 0.000000e+00
50% 0.000000e+00
75% 1.000000e+00
max 1.000000e+00
whereas for the same dataset df_train.columns
gives me:
>>> df_train.columns
Index(['people_id', 'activity_id', 'date', 'activity_category', 'char_1',
'char_2', 'char_3', 'char_4', 'char_5', 'char_6', 'char_7', 'char_8',
'char_9', 'char_10', 'outcome'],
dtype='object')
and df_train.dtypes
gives me:
>>> df_train.dtypes
people_id object
activity_id object
date object
activity_category object
char_1 object
char_2 object
char_3 object
char_4 object
char_5 object
char_6 object
char_7 object
char_8 object
char_9 object
char_10 object
outcome int64
dtype: object
Am I missing some reason why pandas only describe
s one column in the dataset?
include='all'
is the default if all the columns in the dataset are objects (strings)? – Sholokhov