How is a Pandas crosstab different from a Pandas pivot_table?
Asked Answered
M

5

39

Both pandas' crosstab and pivot_table functions seem to provide the exact same functionality. Are there any differences?

Milreis answered 28/3, 2016 at 17:44 Comment(1)
Check out this answer. It’s a long one but it’s Section 1 covers crosstab vs pivot_table. Both are applications of groupby.Outstay
J
52

The main difference between the two is the pivot_table expects your input data to already be a DataFrame; you pass a DataFrame to pivot_table and specify the index/columns/values by passing the column names as strings. With cross_tab, you don't necessarily need to have a DataFrame going in, as you just pass array-like objects for index/columns/values.

Looking at the source code for crosstab, it essentially takes the array-like objects you pass, creates a DataFrame, then calls pivot_table as appropriate.

In general, use pivot_table if you already have a DataFrame, so you don't have the additional overhead of creating the same DataFrame again. If you're starting from array-like objects and are only concerned with the pivoted data, use crosstab. In most cases, I don't think it will really make a difference which function you decide to use.

Joeljoela answered 28/3, 2016 at 18:15 Comment(2)
I timed a few options, and turns out pivot_table is one order of magnitude slower than crosstab, and even that is slower than a simple but clunky groupby approach, hereKrall
@Mpa this makes no sense. crosstab calls pivot_table, how is crosstab going to be faster? crosstab adds overhead. This article shows crosstab is the slowest between groupby, pivot_table and crosstab ramiro.org/notebook/pandas-crosstab-groupby-pivotGlycol
A
27

Is it the same, if in pivot_table use aggfunc=len and fill_value=0:

pd.crosstab(df['Col X'], df['Col Y'])
pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0)

EDIT: There is more difference:

Default aggfunc are different: pivot_table - np.mean, crosstab - len.

Parameter margins_name is only in pivot_table.

In pivot_table you can use Grouper for index and columns keywords.


I think if you need simply frequency table, crosstab function is better.

Afterheat answered 28/3, 2016 at 17:46 Comment(3)
But it seems you can also have a aggfunc for the crosstab function.Milreis
pivot_table has a bit more 'analytical' flexibility, such as filling NA values, adding subtotals, etc., but absolutely at the cost of more overhead, as mentioned above.Bonnybonnyclabber
regarding overhead it is just the other way round (if there is any performance difference at all), as crosstab first creates a dataframe and then calls pivot_table.Oxa
V
11

The pivot_table does not have the normalize argument, unfortunately.

In crosstab, the normalize argument calculates percentages by dividing each cell by the sum of cells, as described below:

  • normalize = 'index' divides each cell by the sum of its row
  • normalize = 'columns' divides each cell by the sum of its column
  • normalize = True divides each cell by the total of all cells in the table
Venita answered 29/8, 2019 at 12:48 Comment(1)
For me this seems the most notorius difference. I think it is very strange that this functionality is not included in pivot_table.Guidebook
K
1

Pivot table shows the values from data. Crosstab represent frequency of the data .

Karolkarola answered 6/12, 2022 at 18:51 Comment(2)
Please provide some example to explain your answer using crosstab and pivot_table. That way, the your answer will be much clearer.Rattrap
You can do multiple types of aggregation with crosstabs not just counts by specifying the aggfunc argument.Haland
A
1

Crosstab utilized count() aggregation to fill the values while pivot_table would use any other aggregation such as sum().

Athenaathenaeum answered 23/6, 2023 at 18:5 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.