how to open xlsx file with python 3

Asked 25/5, 2016 at 10:21 Answered 5/2 at 2:16

I have an xlsx file with 1 sheet. I am trying to open it using python 3 (xlrd lib), but I get an empty file!

I use this code:

file_errors_location = "C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx"
workbook_errors = xlrd.open_workbook(file_errors_location)

and I have no errors, but when I type:

workbook_errors.nsheets

I get "0", even the file has some sheets... when I type:

workbook_errors

I get:

xlrd.book.Book object at 0x2..

any help? thanks

Correction answered 25/5, 2016 at 10:21 Comment(1)

What happens when you call 'workbook_errors.sheets()' and then 'workbook_errors.nsheets'? – Surd 25/8, 2017 at 17:54

You can use Pandas pandas.read_excel just like pandas.read_csv:

import pandas as pd
file_errors_location = 'C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx'
df = pd.read_excel(file_errors_location)
print(df)

Cosh answered 19/2, 2018 at 16:2 Comment(3)

Note that you need to have the library xlrd installed for this to work. – Preconize 18/4, 2019 at 9:23

@Preconize is right. You can install xlrd with pip install xlrd – Conqueror 5/11, 2019 at 15:53

vary elegant! you can iterate over the rows using: for index, row in df.iterrows(): – Prunella 15/2, 2020 at 18:23

There are two modules for reading xls file : openpyxl and xlrd

This script allow you to transform a excel data to list of dictionnaries using xlrd

import xlrd

workbook = xlrd.open_workbook('C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx')
workbook = xlrd.open_workbook('C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx', on_demand = True)
worksheet = workbook.sheet_by_index(0)
first_row = [] # The row where we stock the name of the column
for col in range(worksheet.ncols):
    first_row.append( worksheet.cell_value(0,col) )
# tronsform the workbook to a list of dictionnary
data =[]
for row in range(1, worksheet.nrows):
    elm = {}
    for col in range(worksheet.ncols):
        elm[first_row[col]]=worksheet.cell_value(row,col)
    data.append(elm)
print data

Abuttal answered 25/5, 2016 at 10:26 Comment(4)

when I type: "worksheet = workbook.sheet_by_index(0)" I get the error: "list index out of range...." because if I type worksheet .nsheets , I get 0 !! – Correction 25/5, 2016 at 10:30

no... I tried now to open a new file, it worked well.. it seems the file is locked or somthing – Correction 25/5, 2016 at 10:34

not sure.. because I still need to open that "locked" file. – Correction 25/5, 2016 at 10:56

The rows are indexed starting at zero, according to the documentation. You are missing the first row. – Pickard 14/11, 2018 at 20:36

Unfortunately, the python engine 'xlrd' that is required to read the Excel docs has explicitly removed support for anything other than xls files.

So here's how you can do it now -

Install openpyxl:

https://openpyxl.readthedocs.io/en/stable/
Change your pandas code to:

pandas.read_excel('cat.xlsx', engine='openpyxl')

Note: This worked for me with the latest version of Pandas (i.e. 1.1.5). Previously, I was using version 0.24.0 and it didn't work so I had to update to latest version.

Evacuate answered 15/2, 2021 at 6:20 Comment(2)

Just used pandas version 1.3.2, it asked me for dependency of openpyxl, installed it and pandas.read_excel worked without specifying engine parameter – Cholecystitis 1/9, 2021 at 21:40

@FlorentRoques when you do not explicitly specify the engine parameter, it defaults to None. In such case, there is a series of logic that determines the most appropriate engine for your document. Please check this link for more details - pandas.pydata.org/docs/reference/api/pandas.read_excel.html – Evacuate 4/9, 2021 at 21:15

Another way to do it:

import openpyxl 
workbook_errors = openpyxl.Workbook()
workbook_errors = openpyxl.load_workbook(file_errors_location)

Atlante answered 7/10, 2020 at 19:44 Comment(0)

For those who don't want to use openpyxl due to:

have openpyxl running super slow;
find openpyxl too complicated;
just want to load a worksheet as a dataframe;

xlwings is recommended (install by pip install xlwings), below is a sample:

import xlwings as xw
import pandas as pd

wb_path = ".\\***.xlsx" # the .xlsx file is in the same directory as the .py script
ws_name = "***" # the name of the worksheet to load data

wb = xw.Book(wb_path)
ws = wb.sheets[ws_name]

MAX_ROW = 100 # the rows to read, this can be larger, the empty rows can be dropped in Pandas dataframe
MAX_COL = 40 # the columns to read

data = ws[:MAX_ROW,:MAX_COL].value
df = pd.DataFrame(data)

By above codes, the worksheet can be loaded into a Pandas dataframe way faster than openpyxl does.

Voguish answered 5/2 at 2:16 Comment(1)

When I run this code, I encounter the error Local application 'Microsoft Excel.app' not found. If I need Excel installed on my machine to use this library, then I'd rather use pandas.read_excel() – Herzegovina 3/5 at 10:24

Recommended topics

Hot tags