ValueError: If using all scalar values, you must pass an index
Asked Answered
G

1

7

Take the following code:

import MySQLdb as mdb
import pandas as pd

con = mdb.connect(db_host, db_user, db_pass, db_name)

query = """SELECT `TIME`.`BID-CLOSE`
          FROM `EUR-USD`.`tbl_EUR-USD_1-Day`
          WHERE TIME >= '2006-12-15 22:00:00' AND TIME <= '2007-01-03 22:00:00'
          ORDER BY TIME ASC;"""

# Create a pandas dataframe from the SQL query
eurusd = pd.read_sql_query(query, con=con, index_col='TIME')
idx = pd.date_range('2006-12-17 22:00:00', '2007-01-03 22:00:00')
eurusd.reindex(idx, fill_value=None)

This gives an output of

                     BID-CLOSE
2006-12-17 22:00:00    1.30971
2006-12-18 22:00:00    1.31971
2006-12-19 22:00:00    1.31721
2006-12-20 22:00:00    1.31771
2006-12-21 22:00:00    1.31411
2006-12-22 22:00:00        NaN
2006-12-23 22:00:00        NaN
2006-12-24 22:00:00        NaN
2006-12-25 22:00:00    1.30971
2006-12-26 22:00:00    1.31131
2006-12-27 22:00:00    1.31491
2006-12-28 22:00:00    1.32021
2006-12-29 22:00:00        NaN
2006-12-30 22:00:00        NaN
2006-12-31 22:00:00    1.32731
2007-01-01 22:00:00    1.32731
2007-01-02 22:00:00    1.31701
2007-01-03 22:00:00    1.30831

Re-Index the data

eurusd = eurusd.reindex(idx, fill_value=None)

List of interpolate types

methods = ['linear', 'quadratic', 'cubic']

Next line throws an Exception...

pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})
ValueError: If using all scalar values, you must pass an index

Following the Interpolation section of this guide http://pandas.pydata.org/pandas-docs/stable/missing_data.html How do I correctly 'pass an index' in this situation?

Update 1

The output of eurusd.interpolate('linear')

                     BID-CLOSE
2006-12-17 22:00:00   1.309710
2006-12-18 22:00:00   1.319710
2006-12-19 22:00:00   1.317210
2006-12-20 22:00:00   1.317710
2006-12-21 22:00:00   1.314110
2006-12-22 22:00:00   1.313010
2006-12-23 22:00:00   1.311910
2006-12-24 22:00:00   1.310810
2006-12-25 22:00:00   1.309710
2006-12-26 22:00:00   1.311310
2006-12-27 22:00:00   1.314910
2006-12-28 22:00:00   1.320210
2006-12-29 22:00:00   1.322577
2006-12-30 22:00:00   1.324943
2006-12-31 22:00:00   1.327310
2007-01-01 22:00:00   1.327310
2007-01-02 22:00:00   1.317010
2007-01-03 22:00:00   1.308310

Update 2

In[9]: pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})
Out[9]: 
                        cubic    linear  quadratic
2006-12-17 22:00:00  1.309710  1.309710   1.309710
2006-12-18 22:00:00  1.319710  1.319710   1.319710
2006-12-19 22:00:00  1.317210  1.317210   1.317210
2006-12-20 22:00:00  1.317710  1.317710   1.317710
2006-12-21 22:00:00  1.314110  1.314110   1.314110
2006-12-22 22:00:00  1.310762  1.313010   1.307947
2006-12-23 22:00:00  1.309191  1.311910   1.305159
2006-12-24 22:00:00  1.308980  1.310810   1.305747
2006-12-25 22:00:00  1.309710  1.309710   1.309710
2006-12-26 22:00:00  1.311310  1.311310   1.311310
2006-12-27 22:00:00  1.314910  1.314910   1.314910
2006-12-28 22:00:00  1.320210  1.320210   1.320210
2006-12-29 22:00:00  1.323674  1.322577   1.321632
2006-12-30 22:00:00  1.325553  1.324943   1.323998
2006-12-31 22:00:00  1.327310  1.327310   1.327310
2007-01-01 22:00:00  1.327310  1.327310   1.327310
2007-01-02 22:00:00  1.317010  1.317010   1.317010
2007-01-03 22:00:00  1.308310  1.308310   1.308310
Grouper answered 10/10, 2016 at 20:50 Comment(3)
is EURUSD the same thing as eurusd???Dispread
In other words, what is the output of EURUSD.interpolate('linear'), for example...?Dispread
@Dispread sorry my bad I have edited the postGrouper
D
7

The problem is that when you use the DataFrame constructor:

pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})

the value for each m is a DataFrame, which will be interpreted as a scalar value, which is admittedly confusing. This constructer expects some sort of sequence or Series. The following should solve the problem:

pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})

Since subsetting on a column returns a Series. So, for example instead of:

In [34]: pd.DataFrame({'linear':df.interpolate('linear')})
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-4b6c095c6da3> in <module>()
----> 1 pd.DataFrame({'linear':df.interpolate('linear')})

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    222                                  dtype=dtype, copy=copy)
    223         elif isinstance(data, dict):
--> 224             mgr = self._init_dict(data, index, columns, dtype=dtype)
    225         elif isinstance(data, ma.MaskedArray):
    226             import numpy.ma.mrecords as mrecords

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
    358             arrays = [data[k] for k in keys]
    359 
--> 360         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    361 
    362     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5229     # figure out the index, if necessary
   5230     if index is None:
-> 5231         index = extract_index(arrays)
   5232     else:
   5233         index = _ensure_index(index)

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in extract_index(data)
   5268 
   5269         if not indexes and not raw_lengths:
-> 5270             raise ValueError('If using all scalar values, you must pass'
   5271                              ' an index')
   5272 

ValueError: If using all scalar values, you must pass an index

Use this instead:

In [35]: pd.DataFrame({'linear':df['BID-CLOSE'].interpolate('linear')})
Out[35]: 
                       linear
timestamp                    
2016-10-10 22:00:00  1.309710
2016-10-10 22:00:00  1.319710
2016-10-10 22:00:00  1.317210
2016-10-10 22:00:00  1.317710
2016-10-10 22:00:00  1.314110
2016-10-10 22:00:00  1.313010
2016-10-10 22:00:00  1.311910
2016-10-10 22:00:00  1.310810
2016-10-10 22:00:00  1.309710
2016-10-10 22:00:00  1.311310
2016-10-10 22:00:00  1.314910
2016-10-10 22:00:00  1.320210
2016-10-10 22:00:00  1.322577
2016-10-10 22:00:00  1.324943
2016-10-10 22:00:00  1.327310
2016-10-10 22:00:00  1.327310
2016-10-10 22:00:00  1.317010
2016-10-10 22:00:00  1.308310

Fair warning, though, I am getting a LinAlgError: singular matrix error when I try 'quadratic' and 'cubic' interpolation on your data. Not sure why though.

Dispread answered 10/10, 2016 at 22:6 Comment(7)
I dont receive any errors and the data output as per above Update 2.Grouper
@James did this solution help?Dispread
@ juanpa.arrivillaga Yes this has worked but I have a problem when I try and plot the on a chart. Sorry I didnt include this on my original post. I am happy to mark this one solved and ask another question if you wish? If you look at the the link pandas.pydata.org/pandas-docs/stable/missing_data.html and scroll down, you will see a chart plotted with all three in driffent colours. When I try and plot using 'eurusd.plot()' the chart is missing the filled in parts.Grouper
Edited post to remove other questionGrouper
@James It is best to keep distinct questions as separate. Plotting doesn't really relate to the crux of your question. But really quickly, it seems like you are plotting the original DataFrame. You have to assign the result of what you did to a new data frame or to the original and plot that.Dispread
agreed. Thanks againGrouper
You was correct in your last message. "it seems like you are plotting the original DataFrame" Thank youGrouper

© 2022 - 2024 — McMap. All rights reserved.