As @ryankdwyer pointed out, it was an issue in the underlying statsmodels
implementation which is no longer existent in the 0.8.0
release.
Since kaggle won't allow you to access the internet from any kernel/script, upgrading the package is not an option. You basically have the following two alternatives:
- Use
sns.distplot(myseries, bins=50, kde=False)
. This will of course not print the kde.
- Manually patch the
statsmodels
implementation with the code from version 0.8.0
. Admittedly, this is a bit hacky, but you will get the kde plot.
Here is an example (and a proof on kaggle):
import numpy as np
def _revrt(X,m=None):
"""
Inverse of forrt. Equivalent to Munro (1976) REVRT routine.
"""
if m is None:
m = len(X)
i = int(m // 2+1)
y = X[:i] + np.r_[0,X[i:],0]*1j
return np.fft.irfft(y)*m
from statsmodels.nonparametric import kdetools
# replace the implementation with new method.
kdetools.revrt = _revrt
# import seaborn AFTER replacing the method.
import seaborn as sns
# draw the distplot with the kde function
sns.distplot(myseries, bins=50, kde=True)
Why does it work? Well, it relates to the way Python loads modules. From the Python docs:
5.3.1. The module cache
The first place checked during import search is sys.modules
. This mapping serves as a cache of all modules that have been previously imported, including the intermediate paths. So if foo.bar.baz
was previously imported, sys.modules
will contain entries for foo
, foo.bar
, and foo.bar.baz
. Each key will have as its value the corresponding module object.
Therefore, the from statsmodels.nonparametric import kdetools
is inside this module cache. The next time seaborn acquires it, the cached version will be returned by the Python module loader. Since this cached version is the module that we have adapted, our patch of the revrt
function is used. By the way, this practice is very handy when writing unit tests and is called mocking.