Statsmodels mosaic plot ValueError: cannot convert float NaN to integer
Asked Answered
H

1

5

I have a simple pandas DataFrame, for which I would like to create a mosaic plot. Here is my code:

import pandas as pd
from statsmodels.graphics.mosaicplot import mosaic 

mydata = pd.DataFrame({'id2': {64: 'Angelica', 
                               65: 'DXW_UID', 66: 'casuid01', 
                               67: 'casuid01', 68: 'EC93_uid', 
                               69: 'EC93_uid', 70: 'EC93_uid', 
                               60: 'DXW_UID',  61: 'AtmosFox', 
                               62: 'DXW_UID', 63: 'DXW_UID'}, 
                       'id1': {64: 'TGP', 
                               65: 'Retention01', 66: 'default',
                               67: 'default', 68: 'Musa_EC_9_3', 
                               69: 'Musa_EC_9_3', 70: 'Musa_EC_9_3', 
                               60: 'default', 61: 'default', 
                               62: 'default', 63: 'default'}})

mydata
            id1       id2
60      default   DXW_UID
61      default  AtmosFox
62      default   DXW_UID
63      default   DXW_UID
64          TGP  Angelica
65  Retention01   DXW_UID
66      default  casuid01
67      default  casuid01
68  Musa_EC_9_3  EC93_uid
69  Musa_EC_9_3  EC93_uid
70  Musa_EC_9_3  EC93_uid

[11 rows x 2 columns]

I can create a mosaic plot just fine when I exclude row 64.

mosaic(mydata[mydata.id1!='TGP'], ['id1','id2'])
(<matplotlib.figure.Figure object at 0x11E0D3B0>, OrderedDict([(('default', 'DXW_UID'), (0.0, 0.0, 0.594059405940594, 0.49504950495049505)), (('default', 'AtmosFox'), (0.0, 0.49834983498349833, 0.594059405940594, 0.16501650165016499)), (('default', 'casuid01'), (0.0, 0.66666666666666663, 0.594059405940594, 0.33003300330033009)), (('default', 'EC93_uid'), (0.0, 1.0, 0.594059405940594, 0.0)), (('Retention01', 'DXW_UID'), (0.599009900990099, 0.0, 0.09900990099009899, 0.99009900990099009)), (('Retention01', 'AtmosFox'), (0.599009900990099, 0.99339933993399343, 0.09900990099009899, 0.0)), (('Retention01', 'casuid01'), (0.599009900990099, 0.99669966996699666, 0.09900990099009899, 0.0)), (('Retention01', 'EC93_uid'), (0.599009900990099, 1.0, 0.09900990099009899, 0.0)), (('Musa_EC_9_3', 'DXW_UID'), (0.7029702970297029, 0.0, 0.29702970297029707, 0.0)), (('Musa_EC_9_3', 'AtmosFox'), (0.7029702970297029, 0.0033003300330033004, 0.29702970297029707, 0.0)), (('Musa_EC_9_3', 'casuid01'), (0.7029702970297029, 0.0066006600660066007, 0.29702970297029707, 0.0)), (('Musa_EC_9_3', 'EC93_uid'), (0.7029702970297029, 0.0099009900990099011, 0.29702970297029707, 0.99009900990099009))]))

The plot comes out fine (with the exception of some of the labels looking a little funny--but that's not the issue).

The errors occur when I include row 64. My questions are, why does this row cause this error, and how can I fix it? I can see that the error occurs when trying to draw the image, but it is not at all obvious where the NaN is coming from, especially since the plot before worked just fine.

mosaic(mydata, ['id1','id2'])
(<matplotlib.figure.Figure object at 0x11D13ED0>, OrderedDict([(('default', 'DXW_UID'), (0.0, 0.0, 0.5373936408419167, 0.49342105263157893)), (('default', 'AtmosFox'), (0.0, 0.49671052631578938, 0.5373936408419167, 0.16447368421052627)), (('default', 'casuid01'), (0.0, 0.66447368421052622, 0.5373936408419167, 0.32894736842105265)), (('default', 'Angelica'), (0.0, 0.99671052631578938, 0.5373936408419167, 0.0)), (('default', 'EC93_uid'), (0.0, 1.0, 0.5373936408419167, 0.0)), (('TGP', 'DXW_UID'), (0.5423197492163009, 0.0, 0.08956560680698614, 0.0)), (('TGP', 'AtmosFox'), (0.5423197492163009, 0.0032894736842105261, 0.08956560680698614, 0.0)), (('TGP', 'casuid01'), (0.5423197492163009, 0.0065789473684210523, 0.08956560680698614, 0.0)), (('TGP', 'Angelica'), (0.5423197492163009, 0.0098684210526315784, 0.08956560680698614, 0.98684210526315785)), (('TGP', 'EC93_uid'), (0.5423197492163009, 1.0, 0.08956560680698614, 0.0)), (('Retention01', 'DXW_UID'), (0.6368114643976712, 0.0, 0.08956560680698614, 0.98684210526315785)), (('Retention01', 'AtmosFox'), (0.6368114643976712, 0.99013157894736836, 0.08956560680698614, 0.0)), (('Retention01', 'casuid01'), (0.6368114643976712, 0.99342105263157876, 0.08956560680698614, 0.0)), (('Retention01', 'Angelica'), (0.6368114643976712, 0.99671052631578938, 0.08956560680698614, 0.0)), (('Retention01', 'EC93_uid'), (0.6368114643976712, 1.0, 0.08956560680698614, 0.0)), (('Musa_EC_9_3', 'DXW_UID'), (0.7313031795790416, 0.0, 0.2686968204209583, 0.0)), (('Musa_EC_9_3', 'AtmosFox'), (0.7313031795790416, 0.0032894736842105261, 0.2686968204209583, 0.0)), (('Musa_EC_9_3', 'casuid01'), (0.7313031795790416, 0.0065789473684210523, 0.2686968204209583, 0.0)), (('Musa_EC_9_3', 'Angelica'), (0.7313031795790416, 0.0098684210526315784, 0.2686968204209583, 0.0)), (('Musa_EC_9_3', 'EC93_uid'), (0.7313031795790416, 0.013157894736842105, 0.2686968204209583, 0.98684210526315785))]))

When I run the above, I get this Traceback:

  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_qt4.py", line 374, in idle_draw
    self.draw()
  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_qt4agg.py", line 154, in draw
    FigureCanvasAgg.draw(self)
  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_agg.py", line 451, in draw
    self.figure.draw(self.renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\figure.py", line 1034, in draw
    func(*args)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 2086, in draw
    a.draw(renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\axis.py", line 1096, in draw
    tick.draw(renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\axis.py", line 241, in draw
    self.label1.draw(renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\text.py", line 598, in draw
    ismath=ismath, mtext=self)
  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_agg.py", line 188, in draw_text
    font.get_image(), np.round(x - xd), np.round(y + yd) + 1, angle, gc)
ValueError: cannot convert float NaN to integer
Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_qt4.py", line 299, in resizeEvent
    self.draw()
  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_qt4agg.py", line 154, in draw
    FigureCanvasAgg.draw(self)
  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_agg.py", line 451, in draw
    self.figure.draw(self.renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\figure.py", line 1034, in draw
    func(*args)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 2086, in draw
    a.draw(renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\axis.py", line 1096, in draw
    tick.draw(renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\axis.py", line 241, in draw
    self.label1.draw(renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\text.py", line 598, in draw
    ismath=ismath, mtext=self)
  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_agg.py", line 188, in draw_text
    font.get_image(), np.round(x - xd), np.round(y + yd) + 1, angle, gc)
ValueError: cannot convert float NaN to integer

I ran the above code in the spyder IDE, with default settings.

A similar issue was addressed here, and numerical underflow was the culprit. However, if that is the case here, it is not at all obvious why.

Hazard answered 24/6, 2015 at 16:16 Comment(6)
This is very interesting. It's a problem with matplotlib rendering the labels, nothing to do with the data. To see this, try mosaic(mydata.replace({'Angelica':'Angelico'}), ['id1', 'id2']) which should work fine.Warden
I need some help debugging this: in matplotlib/text.py, there's a Text object whose self._transform has some nan values in one of its Bboxs. I'm out of my depth.Warden
The problem seems to be caused by axes labels because setting axes_labels=False "solves" the problem: mosaic(mydata, ['id1','id2'], axes_label=False)Omsk
I get some strange behavior. If I copy the example I get the exception. After replacing the last a in 'Angelica' by an a (same letter), I don't get an exception. I'm using python 3.4. Aside: there is a RuntimeWarning for the labels because of the dimension that should be fixed soon.Shupe
If I run this repeatedly, then I get the exception every once in a while. My guess is now that it's a floating point issue in calculating the label coordinates (or something like that).Shupe
When I use the PR with the correction, then I don't get the exception in all my tries. github.com/statsmodels/statsmodels/pull/2286Shupe
W
6

According to the docs the first parameter should be a contingency table. The fact that your way of doing things works at all seems to be an undocumented feature.

The behaviour you're seeing (including your "funny" looking labels) is because many of the entries in your contingency table are zero, and something in the labelling code of mosiac is having a hard time with that.

To see this, convert your DataFrame to a contingency table:

In [161]: pd.crosstab(mydata.id1, mydata.id2)
Out[161]: 
id2          Angelica  AtmosFox  DXW-UID  EC93-uid  casuid01
id1                                                         
Musa-EC-9-3         0         0        0         3         0
Retention01         0         0        1         0         0
TGP                 1         0        0         0         0
default             0         1        3         0         2

And add a "little bit" to all those zeros. The mosiac then works fine.

In [165]: ct = pd.crosstab(mydata.id1, mydata.id2)
In [166]: ctplus = ct + 1
In [167]: mosaic(ctplus.unstack())

Which results in the rather beautiful: Beautiful mosaic plot

The tiny downside is that it's wrong! But you can remedy that by doing

ctplus = ct + 1e-8

to just add a tiny bit to all those zeros. The plot still works (but looks ugly because the labels on all those zero tiles of the mosaic are all on top of each other):

A much uglier mosaic plot

Warden answered 24/6, 2015 at 16:26 Comment(2)
Thanks LondonRob! I'm still not sure why 'Angelica' was being so rude :), but the contingency table way solved the problem for me. I enjoyed seeing your way of tweaking the aesthetics too. The docs do suggest support for using DataFrames (see the very last paragraph in your link): "Using a DataFrame as source, specifying the name of the columns of interest >>> gender = [‘male’, ‘male’, ‘male’, ‘female’, ‘female’, ‘female’] >>> pet = [‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘cat’] >>> data = pandas.DataFrame({‘gender’: gender, ‘pet’: pet}) >>> mosaic(data, [‘pet’, ‘gender’]) >>> pylab.show()".Hazard
@LondonRob, very useful answer indeed. Is there some way to remove the text overlays?Laryssa

© 2022 - 2024 — McMap. All rights reserved.