How is a Pandas crosstab different from a Pandas pivot_table?

Asked 28/3, 2016 at 17:44 Answered 23/6, 2023 at 18:5

Both pandas' crosstab and pivot_table functions seem to provide the exact same functionality. Are there any differences?

Milreis answered 28/3, 2016 at 17:44 Comment(1)

Check out this answer. It’s a long one but it’s Section 1 covers crosstab vs pivot_table. Both are applications of groupby. – Outstay 3/11, 2023 at 18:50

The main difference between the two is the pivot_table expects your input data to already be a DataFrame; you pass a DataFrame to pivot_table and specify the index/columns/values by passing the column names as strings. With cross_tab, you don't necessarily need to have a DataFrame going in, as you just pass array-like objects for index/columns/values.

Looking at the source code for crosstab, it essentially takes the array-like objects you pass, creates a DataFrame, then calls pivot_table as appropriate.

In general, use pivot_table if you already have a DataFrame, so you don't have the additional overhead of creating the same DataFrame again. If you're starting from array-like objects and are only concerned with the pivoted data, use crosstab. In most cases, I don't think it will really make a difference which function you decide to use.

Joeljoela answered 28/3, 2016 at 18:15 Comment(2)

I timed a few options, and turns out pivot_table is one order of magnitude slower than crosstab, and even that is slower than a simple but clunky groupby approach, here – Krall 31/10, 2018 at 5:9

@Mpa this makes no sense. crosstab calls pivot_table, how is crosstab going to be faster? crosstab adds overhead. This article shows crosstab is the slowest between groupby, pivot_table and crosstab ramiro.org/notebook/pandas-crosstab-groupby-pivot – Glycol 13/8, 2021 at 13:39

Is it the same, if in pivot_table use aggfunc=len and fill_value=0:

pd.crosstab(df['Col X'], df['Col Y'])
pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0)

EDIT: There is more difference:

Default aggfunc are different: pivot_table - np.mean, crosstab - len.

Parameter margins_name is only in pivot_table.

In pivot_table you can use Grouper for index and columns keywords.

I think if you need simply frequency table, crosstab function is better.

Afterheat answered 28/3, 2016 at 17:46 Comment(3)

But it seems you can also have a aggfunc for the crosstab function. – Milreis 28/3, 2016 at 17:47

pivot_table has a bit more 'analytical' flexibility, such as filling NA values, adding subtotals, etc., but absolutely at the cost of more overhead, as mentioned above. – Bonnybonnyclabber 20/2, 2019 at 20:33

regarding overhead it is just the other way round (if there is any performance difference at all), as crosstab first creates a dataframe and then calls pivot_table. – Oxa 7/8, 2019 at 14:54

The pivot_table does not have the normalize argument, unfortunately.

In crosstab, the normalize argument calculates percentages by dividing each cell by the sum of cells, as described below:

normalize = 'index' divides each cell by the sum of its row
normalize = 'columns' divides each cell by the sum of its column
normalize = True divides each cell by the total of all cells in the table

Venita answered 29/8, 2019 at 12:48 Comment(1)

For me this seems the most notorius difference. I think it is very strange that this functionality is not included in pivot_table. – Guidebook 24/4, 2022 at 3:4

Pivot table shows the values from data. Crosstab represent frequency of the data .

Karolkarola answered 6/12, 2022 at 18:51 Comment(2)

Please provide some example to explain your answer using crosstab and pivot_table. That way, the your answer will be much clearer. – Rattrap 9/12, 2022 at 5:9

You can do multiple types of aggregation with crosstabs not just counts by specifying the aggfunc argument. – Haland 10/1, 2023 at 14:11

Crosstab utilized count() aggregation to fill the values while pivot_table would use any other aggregation such as sum().

Athenaathenaeum answered 23/6, 2023 at 18:5 Comment(0)

Recommended topics

Hot tags