reading excel to a python data frame starting from row 5 and including headers
Asked Answered
M

3

45

I have an excel workbook that runs some vba on opening which refreshes a pivot table and does some other stuff.

Then I wish to import the results of the pivot table refresh into a dataframe in python for further analysis.

import xlrd

wb = xlrd.open_workbook('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm')

The refreshing and opening of the file works fine. But how do I select the data from the first sheet from say row 5 including header down to last record n.

Milkandwater answered 9/7, 2013 at 12:52 Comment(0)
I
69

You can use pandas' ExcelFile parse method to read Excel sheets, see io docs:

xls = pd.ExcelFile('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm')

df = xls.parse('Sheet1', skiprows=4, index_col=None, na_values=['NA'])

skiprows will ignore the first 4 rows (i.e. start at row index 4), and several other options.

Inexpressible answered 9/7, 2013 at 13:2 Comment(0)
R
35

The accepted answer is old (as discussed in comments of the accepted answer). Now the preferred option is using pd.read_excel(). For example:

df = pandas.read_excel('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm'), skiprows=[0,1,2,3,4])
Ringtailed answered 28/4, 2017 at 18:1 Comment(0)
L
2

The other answers skip the header together with the first 4 rows. To include the header, skiprows should "skip" over it.

df = pd.read_excel('Book1.xlsx', skiprows=range(1, 5))

or

with pd.ExcelFile('Book1.xlsx') as f:
    df = f.parse('Sheet1', skiprows=range(1,5))

should do the job.

Lipfert answered 18/2, 2023 at 1:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.