Groupby
and value_counts
are totally different functions. You cannot perform value_counts on a dataframe.
Value Counts
are limited only for a single column or series and it's sole purpose is to return the series of frequencies of values
Groupby
returns a object so one can perform statistical computations over it. So when you do df.groupby(col).count()
it will return the number of true values present in columns with respect to the specific columns
in groupby.
When should be value_counts
used and when should groupby.count
be used :
Lets take an example
df = pd.DataFrame({'id': [1, 2, 3, 4, 2, 2, 4], 'color': ["r","r","b","b","g","g","r"], 'size': [1,2,1,2,1,3,4]})
Groupby count:
df.groupby('color').count()
id size
color
b 2 2
g 2 2
r 3 3
Groupby count is generally used for getting the valid number of values
present in all the columns with reference to
or with respect to
one
or more columns specified. So not a number (nan) will be excluded.
To find the frequency using groupby you need to aggregate against the specified column itself like @jez did. (maybe to avoid this and make developers life easy value_counts is implemented ).
Value Counts:
df['color'].value_counts()
r 3
g 2
b 2
Name: color, dtype: int64
Value count is generally used for finding the frequency of the values
present in one particular column.
In conclusion :
.groupby(col).count()
should be used when you want to find the frequency of valid values present in columns with respect to specified col
.
.value_counts()
should be used to find the frequencies of a series.