How to create mosaic plot from Pandas dataframe with Statsmodels library?
Asked Answered
F

2

12

Using Python 3.4, Pandas 0.15 and Statsmodels 0.6.0, I try to create a mosaic plot from a dataframe as described in the Statsmodels documentation. However, I just don't understand how the input has to be formatted that is provided to the mosaic() function.

Given a simple dataframe:

In:
myDataframe = pd.DataFrame({'size' : ['small', 'large', 'large', 'small', 'large', 'small'], 'length' : ['long', 'short', 'short', 'long', 'long', 'short']})

Out:
  length   size
0   long  small
1  short  large
2  short  large
3   long  small
4   long  large
5  short  small

When trying to create a mosaic plot of this data:

from statsmodels.graphics.mosaicplot import mosaic
mosaic(data=myDataframe, title='Mosaic Plot')

gives ValueError: cannot label index with a null key

As mosaic plots are a visualization of contingency tables, I tried to create such first with

In:
myCrosstable = pd.crosstab(myDataframe['size'], myDataframe['length'])

Out:
length  long  short
size               
large      1      2
small      2      1

Still, using myCrosstable as data argument gives the same error.

How does the dataframe have to be formatted in order to get accepted by the mosaic() function? The documentation says as explanation for the data argument: Parameters:

data : dict, pandas.Series, np.ndarray, pandas.DataFrame

The contingency table that contains the data. Each category should contain a non-negative number with a tuple as index.

Isn't that what the pd.crosstab function returns? If not, how can I convert the dataframe accordingly?

Forfeit answered 1/12, 2014 at 9:49 Comment(0)
T
17

I used your data and this code:

mosaic(myDataframe, ['size', 'length'])

and got the chart like this:

mosaic chart

Talkie answered 1/12, 2014 at 11:8 Comment(2)
Hi, how would i make it so that the boxes are based on another column's values?Bingen
@Bingen - Second argument to mosaic is the list of columns you want to use to size the boxes. If this doesn't help - perhaps try a separate question.Talkie
D
4

You can also use the stack function on the crosstab to avoid recomputing the contingency table.

With your code, mosaic(myCrossTable.stack()) works.

Deandre answered 23/2, 2016 at 10:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.