Both pandas' crosstab
and pivot_table
functions seem to provide the exact same functionality. Are there any differences?
The main difference between the two is the pivot_table
expects your input data to already be a DataFrame; you pass a DataFrame to pivot_table
and specify the index
/columns
/values
by passing the column names as strings. With cross_tab
, you don't necessarily need to have a DataFrame going in, as you just pass array-like objects for index
/columns
/values
.
Looking at the source code for crosstab
, it essentially takes the array-like objects you pass, creates a DataFrame, then calls pivot_table
as appropriate.
In general, use pivot_table
if you already have a DataFrame, so you don't have the additional overhead of creating the same DataFrame again. If you're starting from array-like objects and are only concerned with the pivoted data, use crosstab
. In most cases, I don't think it will really make a difference which function you decide to use.
Is it the same, if in pivot_table
use aggfunc=len
and fill_value=0
:
pd.crosstab(df['Col X'], df['Col Y'])
pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0)
EDIT: There is more difference:
Default aggfunc
are different: pivot_table
- np.mean
, crosstab
- len
.
Parameter margins_name
is only in pivot_table
.
In pivot_table
you can use Grouper
for index
and columns
keywords.
I think if you need simply frequency table, crosstab
function is better.
aggfunc
for the crosstab function. –
Milreis crosstab
first creates a dataframe and then calls pivot_table
. –
Oxa The pivot_table
does not have the normalize
argument, unfortunately.
In crosstab
, the normalize
argument calculates percentages by dividing each cell by the sum of cells, as described below:
normalize = 'index'
divides each cell by the sum of its rownormalize = 'columns'
divides each cell by the sum of its columnnormalize = True
divides each cell by the total of all cells in the table
pivot_table
. –
Guidebook Pivot table shows the values from data. Crosstab represent frequency of the data .
crosstab
and pivot_table
. That way, the your answer will be much clearer. –
Rattrap Crosstab utilized count() aggregation to fill the values while pivot_table would use any other aggregation such as sum().
© 2022 - 2025 — McMap. All rights reserved.
crosstab
vspivot_table
. Both are applications ofgroupby
. – Outstay