Pandas - write Multiindex rows with to_csv
Asked Answered
H

1

15

I am using to_csv to write a Multiindex DataFrame to csv files. The csv file has one column that contains the multiindexes in tuples, like:

('a', 'x')
('a', 'y')
('a', 'z')
('b', 'x')
('b', 'y')
('b', 'z')

However, I want to be able to output the Multiindex to two columns instead of one column of tuples, such as:

a, x
 , y
 , z
b, x
 , y
 , z

It looks like tupleize_cols can achieve this for columns, but there is no such option for the rows. Is there a way to achieve this?

Hazelton answered 27/6, 2013 at 17:25 Comment(1)
tupleize_cols is for a multi-index on columns (its coming in 0.12); prob just reset_index().set_index(['idxa','idxb']).to_csv() is your best bet (the specify index_col=['idxa','idxb'] on read-backMoslemism
M
19

I think this will do it

In [3]: df = DataFrame(dict(A = 'foo', B = 'bar', value = 1),index=range(5)).set_index(['A','B'])

In [4]: df
Out[4]: 
         value
A   B         
foo bar      1
    bar      1
    bar      1
    bar      1
    bar      1

In [5]: df.to_csv('test.csv')

In [6]: !cat test.csv
A,B,value
foo,bar,1
foo,bar,1
foo,bar,1
foo,bar,1
foo,bar,1

In [7]: pd.read_csv('test.csv',index_col=[0,1])
Out[7]: 
         value
A   B         
foo bar      1
    bar      1
    bar      1
    bar      1
    bar      1

To write with the index duplication (kind of a hack though)

In [27]: x = df.reset_index()

In [28]: mask = df.index.to_series().duplicated()

In [29]: mask
Out[29]: 
A    B  
foo  bar    False
     bar     True
     bar     True
     bar     True
     bar     True
dtype: bool

In [30]: x.loc[mask.values,['A','B']] = ''

In [31]: x
Out[31]: 
     A    B  value
0  foo  bar      1
1                1
2                1
3                1
4                1

In [32]: x.to_csv('test.csv')

In [33]: !cat test.csv
,A,B,value
0,foo,bar,1
1,,,1
2,,,1
3,,,1
4,,,1

Read back is a bit tricky actually

In [37]: pd.read_csv('test.csv',index_col=0).ffill().set_index(['A','B'])
Out[37]: 
         value
A   B         
foo bar      1
    bar      1
    bar      1
    bar      1
    bar      1
Moslemism answered 27/6, 2013 at 17:36 Comment(6)
Sorry maybe I didn't explain clearly. What I hope to achieve is to remove duplicate entries in the csv file. In your example, I hope to have 'foo' to appear only once in the csv file, so that when you cat test.csv, you would see one 'foo' in the first row and blank in the subsequent rows.Hazelton
updated...though this is pretty non-standard and not sure this will generalizeMoslemism
let me ask you, is this format of csv useful/common? e.g. non-duplicating indexes?Moslemism
I don't know if it's common, but in my case it's useful/convenient as a visual representation. Imagine if I need to print out the csv table as some kind of report. Suppose there are 100 rows and the first level of multiindex has 3 groups. The non-duplicating indexes will make it very easy to visually identify the 3 groups.Hazelton
ok....added this solution to the cookbook in any event (will be updated later today): pandas.pydata.org/pandas-docs/dev/cookbook.html#csvMoslemism
If the original dataframe had an actual null value in the value column, the last example, on reading back the dataframe, would wrongly duplicate a previous value on top of itAntistrophe

© 2022 - 2024 — McMap. All rights reserved.