Pandas reset index on series to remove multiindex

Asked 4/9, 2013 at 21:23 Answered 25/5, 2023 at 7:15

I have a Series that looks like this:

1999-03-31  SOLD_PRICE     NaN
1999-06-30  SOLD_PRICE     NaN
1999-09-30  SOLD_PRICE     NaN
1999-12-31  SOLD_PRICE    3.00
2000-03-31  SOLD_PRICE    3.00

with an index that looks like:

MultiIndex
[(1999-03-31 00:00:00, u'SOLD_PRICE'), (1999-06-30 00:00:00, u'SOLD_PRICE'), 
 (1999-09-30 00:00:00, u'SOLD_PRICE'), (1999-12-31 00:00:00, u'SOLD_PRICE'),...]

I don't want the second column as an index. Ideally I'd have a DataFrame with column 1 as "Date" and column 2 as "Sales" (dropping the second level of the index). I don't quite see how to reconfigure the index.

Previdi answered 4/9, 2013 at 21:23 Comment(1)

Just a general comment: naming your variables with numbers indicating the order in which you're using them is a recipe for disaster. Consider the pathological case where pandas didn't have DataFrame or Series and they were just called DataStructure1 and DataStructure2 and all of their methods were just called method1, method2, and so on. Giving your variables meaningful names will make your code legible to others reading it for the first time. Giving meaningful names to your variables also allows you to come back to your code in a month and get back up to speed quickly. – Insurgence 4/9, 2013 at 21:42

Just call reset_index():

In [130]: s
Out[130]:
0           1
1999-03-31  SOLD_PRICE   NaN
1999-06-30  SOLD_PRICE   NaN
1999-09-30  SOLD_PRICE   NaN
1999-12-31  SOLD_PRICE     3
2000-03-31  SOLD_PRICE     3
Name: 2, dtype: float64

In [131]: s.reset_index()
Out[131]:
            0           1   2
0  1999-03-31  SOLD_PRICE NaN
1  1999-06-30  SOLD_PRICE NaN
2  1999-09-30  SOLD_PRICE NaN
3  1999-12-31  SOLD_PRICE   3
4  2000-03-31  SOLD_PRICE   3

There are many ways to drop columns:

Call reset_index() twice and specify a column:

In [136]: s.reset_index(0).reset_index(drop=True)
Out[136]:
            0   2
0  1999-03-31 NaN
1  1999-06-30 NaN
2  1999-09-30 NaN
3  1999-12-31   3
4  2000-03-31   3

Delete the column after resetting the index:

In [137]: df = s.reset_index()

In [138]: df
Out[138]:
            0           1   2
0  1999-03-31  SOLD_PRICE NaN
1  1999-06-30  SOLD_PRICE NaN
2  1999-09-30  SOLD_PRICE NaN
3  1999-12-31  SOLD_PRICE   3
4  2000-03-31  SOLD_PRICE   3

In [139]: del df[1]

In [140]: df
Out[140]:
            0   2
0  1999-03-31 NaN
1  1999-06-30 NaN
2  1999-09-30 NaN
3  1999-12-31   3
4  2000-03-31   3

Call drop() after resetting:

In [144]: s.reset_index().drop(1, axis=1)
Out[144]:
            0   2
0  1999-03-31 NaN
1  1999-06-30 NaN
2  1999-09-30 NaN
3  1999-12-31   3
4  2000-03-31   3

Then, after you've reset your index, just rename the columns

In [146]: df.columns = ['Date', 'Sales']

In [147]: df
Out[147]:
         Date  Sales
0  1999-03-31    NaN
1  1999-06-30    NaN
2  1999-09-30    NaN
3  1999-12-31      3
4  2000-03-31      3

Insurgence answered 4/9, 2013 at 21:25 Comment(0)

When you use double brackets, such as

H3 = H2[['SOLD_PRICE']]

H3 becomes a DataFrame. If you use single brackets,

H3 = H2['SOLD_PRICE']

then H3 becomes a Series. If H3 is a Series, then the result you desire follows naturally:

import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2011', periods=72, freq='M')
H2 = pd.DataFrame(np.arange(len(rng)), index=rng, columns=['SOLD_PRICE'])
H3 = H2['SOLD_PRICE']
H5 = H3.resample('Q', how='count')
H6 = pd.rolling_mean(H5,4)
print(H6.head())

yields

2011-03-31   NaN
2011-06-30   NaN
2011-09-30   NaN
2011-12-31     3
2012-03-31     3
dtype: float64

Meniscus answered 4/9, 2013 at 21:46 Comment(0)

To remove a specific level from a MultiIndex, use droplevel. For example, to remove the second level:

H6 = H6.droplevel(1)

Then to convert the Series into a dataframe, use to_frame.

H6 = H6.to_frame('Sales')

If you want to convert the Series into a dataframe with 2 columns, then droplevel() + reset_index() or two reset_index() calls with different arguments will do the job.

H6 = H6.droplevel(1).rename_axis('Date').reset_index(name='Sales')
# or 
H6 = H6.reset_index(level=0, name='Sales').reset_index(drop=True)

Stamford answered 15/2, 2023 at 5:32 Comment(0)

USING `numpy.squeeze()`

Having a multi-index dataframe :

import pandas as pd 

df = pd.DataFrame([[1,2], [2,3]], columns=[['col1', 'col2']])
print(df.columns)

MultiIndex([('col1',), ('col2',)], )

Then "squeeze" values with np.squeeze():

import numpy as np

val = np.squeeze(df.values).T
df_ = pd.DataFrame(val, [x[0] for x in df.columns]).T
print(df_.columns)

Index(['col1', 'col2'], dtype='object')

Cabretta answered 25/5, 2023 at 7:15 Comment(0)

USING `numpy.squeeze()`

Recommended topics

Hot tags

USING numpy.squeeze()

Recommended topics

Hot tags

USING `numpy.squeeze()`