ValueError: A value in x_new is below the interpolation range
Asked Answered
L

1

7

This is a scikit-learn error that I get when I do

my_estimator = LassoLarsCV(fit_intercept=False, normalize=False, positive=True, max_n_alphas=1e5)

Note that if I decrease max_n_alphas from 1e5 down to 1e4 I do not get this error any more.

Anyone has an idea on what's going on?

The error happens when I call

my_estimator.fit(x, y)

I have 40k data points in 40 dimensions.

The full stack trace looks like this

  File "/usr/lib64/python2.7/site-packages/sklearn/linear_model/least_angle.py", line 1113, in fit
    axis=0)(all_alphas)
  File "/usr/lib64/python2.7/site-packages/scipy/interpolate/polyint.py", line 79, in __call__
    y = self._evaluate(x)
  File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 498, in _evaluate
    out_of_bounds = self._check_bounds(x_new)
  File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 525, in _check_bounds
    raise ValueError("A value in x_new is below the interpolation "
ValueError: A value in x_new is below the interpolation range.
Luciana answered 30/3, 2016 at 22:22 Comment(9)
when I run from sklearn.linear_model import LassoLarsCV followed by your line of code I get no error. please provide enough code to reproduce the error you are getting as well as the full traceback message.Nenitanenney
The error does not occur on that line, but when I call .fit(). Unfortunately, hard to reproduce here, my data set has 40k points.Luciana
The interpolators in scipy often require that the x values are monotonically increasing. Is x monotonically increasing for your dataset? If they're not, try sorting the dataset with x as the key and try again. If it works, let me know and I'll add a proper answer for the bounty :)Olwen
Hmm, looking into this - whilst that might be the case at the point where the code fails, it doesn't really make sense from where you call fit as I'm guessing x is a 40000 x 40 matrix?Olwen
@BaronYugovich: Could you please upload your data somewhere?Grolier
If there wasn't a bounty I'd vote to close as lacking a minimal reproducible example.Yanina
Well - apologies for the "ridiculous suggestion", but you'll note that the bit that's actually throwing the error is interpolate.py in the scipy package, which does have those requirements. However, I'm not really minded to track it further if you won't put up data to reproduce and think it's a good idea to suggest that people offering free help are being ridiculous.Olwen
In addition - to ping people, you need to omit space from their user name and your assertion that the problem is not data related seems not to be backed by any evidence. I agree the 1e4 vs 1e5 difference is interesting, but we need a dataset to replicate and therefore track down, it doesn't happen with all data (as the existing answer shows)Olwen
same here, using LassoLarsCV give me the same error, my data set its smaller but same issue. did you find a solution to your problem ? its a problem with the scipy library ? linkFlank
G
5

There must be something particular to your data. LassoLarsCV() seems to be working correctly with this synthetic example of fairly well-behaved data:

import numpy
import sklearn.linear_model

# create 40000 x 40 sample data from linear model with a bit of noise
npoints = 40000
ndims = 40
numpy.random.seed(1)
X = numpy.random.random((npoints, ndims))
w = numpy.random.random(ndims)
y = X.dot(w) + numpy.random.random(npoints) * 0.1

clf = sklearn.linear_model.LassoLarsCV(fit_intercept=False, normalize=False, max_n_alphas=1e6)
clf.fit(X, y)

# coefficients are almost exactly recovered, this prints 0.00377
print max(abs( clf.coef_ - w ))

# alphas actually used are 41 or ndims+1
print clf.alphas_.shape

This is in sklearn 0.16, I don't have positive=True option.

I'm not sure why you would want to use a very large max_n_alphas anyway. While I don't know why 1e+4 works and 1e+5 doesn't in your case, I suspect the paths you get from max_n_alphas=ndims+1 and max_n_alphas=1e+4 or whatever would be identical for well behaved data. Also the optimal alpha that is estimated by cross-validation in clf.alpha_ is going to be identical. Check out Lasso path using LARS example for what alpha is trying to do.

Also, from the LassoLars documentation

alphas_ array, shape (n_alphas + 1,)

Maximum of covariances (in absolute value) at each iteration. n_alphas is either max_iter, n_features, or the number of nodes in the path with correlation greater than alpha, whichever is smaller.

so it makes sense that we end with alphas_ of size ndims+1 (ie n_features+1) above.

P.S. Tested with sklearn 0.17.1 and positive=True as well, also tested with some positive and negative coefficients, same result: alphas_ is ndims+1 or less.

Grolier answered 4/4, 2016 at 10:41 Comment(4)
It has nothing to do with the data. On the same data set, when decreasing n_alphas, as specified above, the problem disappears. The error happens when generating alphas, not when dealing with the problem set.Luciana
@BaronYugovich You see the code where with the different data set of the same dimensions, a huge max_n_alphas, there is no problem. Why do you think the problem is not data related? Please post a complete runnable example that reproduces your problem. Thanks :)Grolier
Makes sense. Out of curiosity, with your experiment with random data, what do you get with orthogonal matching pursuit #36287545Luciana
@BaronYugovich Does this address your question? I believe what you have found is indeed a skkearn bug, but it is very hard to reproduce without your data. Most importantly it makes no difference to the results you get, use any max alphas > 40 and you'll get the same results, as long as it doesn't crash. If you are satisfied please remember to award the bounty (and accept the answer)Grolier

© 2022 - 2024 — McMap. All rights reserved.