`ValueError: too many values to unpack (expected 4)` with `scipy.stats.linregress`
Asked Answered
T

3

7

I know that this error message (ValueError: too many values to unpack (expected 4)) appears when more variables are set to values than a function returns.

scipy.stats.linregress returns 5 values according to the scipy documentation (http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html).

Here is a short, reproducible example of a working call, and then a failed call, to linregress:

What could account for difference and why is the second one poorly called?

from scipy import stats
import numpy as np

if __name__ == '__main__':
    x = np.random.random(10)
    y = np.random.random(10)
    print(x,y)
    slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)


'''
Code above works
Code below fails
'''

    X = np.asarray([[-15.93675813],
 [-29.15297922],
 [ 36.18954863],
 [ 37.49218733],
 [-48.05882945],
 [ -8.94145794],
 [ 15.30779289],
 [-34.70626581],
 [  1.38915437],
 [-44.38375985],
 [  7.01350208],
 [ 22.76274892]])

    Y = np.asarray( [[  2.13431051],
 [  1.17325668],
 [ 34.35910918],
 [ 36.83795516],
 [  2.80896507],
 [  2.12107248],
 [ 14.71026831],
 [  2.61418439],
 [  3.74017167],
 [  3.73169131],
 [  7.62765885],
 [ 22.7524283 ]])

    print(X,Y) # The array initialization succeeds, if both arrays are print out


    for i in range(1,len(X)):
        slope, intercept, r_value, p_value, std_err = (stats.linregress(X[0:i,:], y = Y[0:i,:]))
Truncheon answered 2/8, 2016 at 15:43 Comment(4)
can you post the complete error message and stacktrace?Balaam
The shape of your X, and Y are: (12, 1) but what you need is (12, ).Iotacism
also, which value of i causes the issue?Balaam
For pandas users with the same error: use df.pop('value') to return (R, ) shape for linregression. This returns the 5 values slope, intercept, r_value, p_value, std_err expected in the docs and this questionSupraliminal
I
9

Your problem originates from slicing the X and Y arrays. Also you do not need the for loop. Use the following instead and it should work.

slope, intercept, r_value, p_value, std_err = stats.linregress(X[:,0], Y[:,0])
Iotacism answered 2/8, 2016 at 15:56 Comment(6)
They may want the for loop still (hard to say). But the solution is to change [...,:] to [...,0].Immodest
I assume s/he has used the for loop to take care of the arrays' shape. If the correct slicing is used, the for loop with not be needed.Iotacism
I thought they were trying to get the results of n different regressions, each considering one additional element in X/Y.Immodest
@Immodest Yes, you're right. It is my intention to get the results of n different regressions. What about the documentation indicates that the solution is to change [...,:] to [...,0], however?Truncheon
@Immodest From the docs, x, y : array_like Two sets of measurements. Both arrays should have the same length. If only x is given (and y=None), then it must be a two-dimensional array where one dimension has length 2. The two sets of measurements are then found by splitting the array along the length-2 dimension.Truncheon
@Muno, because you are creating your arrays using lists where each element is itself a list. This will result in X and Y being two-dimensional arrays, each having .shape = (12,1). But the function doesn't expect two dimensional arrays, it expects one-dimensional arrays (e.g. .shape = (12,)). So you need to extract a single column using [...,0]. When you use [...,:], you're extracting all columns, so the inputs are still each two-dimensional (12,1) arrays.Immodest
I
2

The issue stems from the fact that your input to np.asarray are lists of single elements lists.

Thus, X and Y both have shape of (12,1):

print(X.shape)  # (12, 1)   [or (12L, 1L), depending on version]
print(Y.shape)  # (12, 1)

Note that these are each two-dimensional arrays. Even though one of the dimensions is 1, they're still considered two-dimensional.

Now consider this way of creating an array:

x = np.asarray([1,2,3,4,5])
print(x.shape)  # (5,)

Note in this case, since we passed a list of integers to asarray, we got a one-dimensional array.

Your function, when called with two variables, needs each to be one-dimensional arrays. So, you can either create the arrays initially as one-dimensional:

For example, by hand:

X = np.asarray([-15.93675813,
                -29.15297922,
                 36.18954863,
                 37.49218733,
                -48.05882945,
                 -8.94145794,
                 15.30779289,
                -34.70626581,
                  1.38915437,
                -44.38375985,
                  7.01350208,
                 22.76274892])

Or by list comprehension:

y_data = [[  2.13431051],
          [  1.17325668],
          [ 34.35910918],
          [ 36.83795516],
          [  2.80896507],
          [  2.12107248],
          [ 14.71026831],
          [  2.61418439],
          [  3.74017167],
          [  3.73169131],
          [  7.62765885],
          [ 22.7524283 ]]
Y = np.asarray([e[0] for e in y_data])

Or by slicing:

Y = np.asarray([[  2.13431051],
                [  1.17325668],
                [ 34.35910918],
                [ 36.83795516],
                [  2.80896507],
                [  2.12107248],
                [ 14.71026831],
                [  2.61418439],
                [  3.74017167],
                [  3.73169131],
                [  7.62765885],
                [ 22.7524283 ]])
Y = Y[:,0]

All three methods would result in you having X and Y of shape (12,) (one-dimensional):

print(X.shape)  # (12,)
print(Y.shape)  # (12,)

Then, you could use your loop as:

for i in range(3,len(X)):
    slope, intercept, r_value, p_value, std_err = stats.linregress(X[0:i], y = Y[0:i])
    print(slope)

Note, I started the loop at 3, it's the first value that "makes sense".

Or, you could keep your arrays unmodified as two-dimensional, and just fix the slicing syntax inside your loop:

for i in range(3,len(X)):
    slope, intercept, r_value, p_value, std_err = stats.linregress(X[0:i,0], y = Y[0:i,0])
    print(slope)

This is the method that was suggested in the answer I was commenting to.

Immodest answered 2/8, 2016 at 20:18 Comment(0)
S
0

After several attempts, the following worked for me:

slope, intercept, r_value, p_value, std_err = stats.linregress(X[:], Y[:])
Sherard answered 25/3, 2020 at 12:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.