Convert row to column header for Pandas DataFrame,
Asked Answered
P

6

197

The data I have to work with is a bit messy.. It has header names inside of its data. How can I choose a row from an existing pandas dataframe and make it (rename it to) a column header?

I want to do something like:

header = df[df['old_header_name1'] == 'new_header_name1']

df.columns = header
Patriciapatrician answered 1/10, 2014 at 17:33 Comment(0)
P
304
In [21]: df = pd.DataFrame([(1,2,3), ('foo','bar','baz'), (4,5,6)])

In [22]: df
Out[22]: 
     0    1    2
0    1    2    3
1  foo  bar  baz
2    4    5    6

Set the column labels to equal the values in the 2nd row (index location 1):

In [23]: df.columns = df.iloc[1]

If the index has unique labels, you can drop the 2nd row using:

In [24]: df.drop(df.index[1])
Out[24]: 
1 foo bar baz
0   1   2   3
2   4   5   6

If the index is not unique, you could use:

In [133]: df.iloc[pd.RangeIndex(len(df)).drop(1)]
Out[133]: 
1 foo bar baz
0   1   2   3
2   4   5   6

Using df.drop(df.index[1]) removes all rows with the same label as the second row. Because non-unique indexes can lead to stumbling blocks (or potential bugs) like this, it's often better to take care that the index is unique (even though Pandas does not require it).

Prelude answered 1/10, 2014 at 17:42 Comment(3)
Thank you so much for your quick response! How can I choose a row by value in stead of index location to make it header? So for your example something like.. df.columns = df[df[0] == 'foo']Patriciapatrician
The problem with that is there could be more than one row which has the value "foo". One way around that problem is to explicitly choose the first such row: df.columns = df.iloc[np.where(df[0] == 'foo')[0][0]].Prelude
Ah I see why you did that way. For my case, I know there is only one row that has the value "foo". So it is ok. I just did this way I guess it is the same as the one you gave me above. idx_loc = df[df[0] == 'foo'].index.tolist()[0] df.columns = df.iloc[idx_loc]Patriciapatrician
P
121

This works (pandas v'0.19.2'):

df.rename(columns=df.iloc[0])
Poppo answered 15/3, 2017 at 22:22 Comment(3)
You can remove the "header" row by adding .drop(df.index[0])Kevel
I like this better than the actual accepted answer. I love the short oneline solutions.Helminthiasis
Please keep in mind that after dropping the first row, index would start from 1, so you probably would like to add .reset_index(drop=True).Rident
A
38

It would be easier to recreate the data frame. This would also interpret the columns types from scratch.

headers = df.iloc[0]
new_df  = pd.DataFrame(df.values[1:], columns=headers)
Austenite answered 14/12, 2018 at 8:4 Comment(1)
Simple and easy. Nice!Fernandina
Q
18

To rename the header without reassign df:

df.rename(columns=df.iloc[0], inplace = True)

To drop the row without reassign df:

df.drop(df.index[0], inplace = True)
Queenstown answered 4/9, 2020 at 17:52 Comment(0)
L
6

You can specify the row index in the read_csv or read_html constructors via the header parameter which represents Row number(s) to use as the column names, and the start of the data. This has the advantage of automatically dropping all the preceding rows which supposedly are junk.

import pandas as pd
from io import StringIO

In[1]
    csv = '''junk1, junk2, junk3, junk4, junk5
    junk1, junk2, junk3, junk4, junk5
    pears, apples, lemons, plums, other
    40, 50, 61, 72, 85
    '''

    df = pd.read_csv(StringIO(csv), header=2)
    print(df)

Out[1]
       pears   apples   lemons   plums   other
    0     40       50       61      72      85
Lignify answered 13/8, 2018 at 12:45 Comment(2)
This does not address the question itself, which is asking about an already existing DataFrame.Helotry
some of the users who found this question (possibly the majority) would have a more generic use case than the OP; this answer is for that groupLignify
B
0

Keeping it Python simple

Padas DataFrames have columns attribute why not use it with standard Python, it is much clearer what you are doing:

table = [['name', 'Rf', 'Rg', 'Rf,skin', 'CRI'],
 ['testsala.cxf', '86', '95', '92', '87'],
 ['testsala.cxf: 727037 lm', '86', '95', '92', '87'],
 ['630.cxf', '18', '8', '11', '18'],
 ['Huawei stk-lx1.cxf', '86', '96', '88', '83'],
 ['dedo uv no filtro.cxf', '52', '93', '48', '58']]

import pandas as pd
data = pd.DataFrame(table[1:],columns=table[0])

or in the case is not the first row, but the 10th for instance:

columns = table.pop(10)
data = pd.DataFrame(table,columns=columns)
Brenza answered 26/8, 2022 at 10:38 Comment(2)
Tested for performance, although we know that the creation of a new DataFrame is "time-consuming" Anyhow this approach took 40X more timeVilhelmina
@Vilhelmina thanks for you comment! If you want edit the answerBrenza

© 2022 - 2024 — McMap. All rights reserved.