how to open xlsx file with python 3
Asked Answered
C

5

36

I have an xlsx file with 1 sheet. I am trying to open it using python 3 (xlrd lib), but I get an empty file!

I use this code:

file_errors_location = "C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx"
workbook_errors = xlrd.open_workbook(file_errors_location)

and I have no errors, but when I type:

workbook_errors.nsheets

I get "0", even the file has some sheets... when I type:

workbook_errors 

I get:

xlrd.book.Book object at 0x2..

any help? thanks

Correction answered 25/5, 2016 at 10:21 Comment(1)
What happens when you call 'workbook_errors.sheets()' and then 'workbook_errors.nsheets'?Surd
C
37

You can use Pandas pandas.read_excel just like pandas.read_csv:

import pandas as pd
file_errors_location = 'C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx'
df = pd.read_excel(file_errors_location)
print(df)
Cosh answered 19/2, 2018 at 16:2 Comment(3)
Note that you need to have the library xlrd installed for this to work.Preconize
@Preconize is right. You can install xlrd with pip install xlrdConqueror
vary elegant! you can iterate over the rows using: for index, row in df.iterrows():Prunella
A
19

There are two modules for reading xls file : openpyxl and xlrd

This script allow you to transform a excel data to list of dictionnaries using xlrd

import xlrd

workbook = xlrd.open_workbook('C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx')
workbook = xlrd.open_workbook('C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx', on_demand = True)
worksheet = workbook.sheet_by_index(0)
first_row = [] # The row where we stock the name of the column
for col in range(worksheet.ncols):
    first_row.append( worksheet.cell_value(0,col) )
# tronsform the workbook to a list of dictionnary
data =[]
for row in range(1, worksheet.nrows):
    elm = {}
    for col in range(worksheet.ncols):
        elm[first_row[col]]=worksheet.cell_value(row,col)
    data.append(elm)
print data
Abuttal answered 25/5, 2016 at 10:26 Comment(4)
when I type: "worksheet = workbook.sheet_by_index(0)" I get the error: "list index out of range...." because if I type worksheet .nsheets , I get 0 !!Correction
no... I tried now to open a new file, it worked well.. it seems the file is locked or somthingCorrection
not sure.. because I still need to open that "locked" file.Correction
The rows are indexed starting at zero, according to the documentation. You are missing the first row.Pickard
E
9

Unfortunately, the python engine 'xlrd' that is required to read the Excel docs has explicitly removed support for anything other than xls files.

So here's how you can do it now -

Note: This worked for me with the latest version of Pandas (i.e. 1.1.5). Previously, I was using version 0.24.0 and it didn't work so I had to update to latest version.

Evacuate answered 15/2, 2021 at 6:20 Comment(2)
Just used pandas version 1.3.2, it asked me for dependency of openpyxl, installed it and pandas.read_excel worked without specifying engine parameterCholecystitis
@FlorentRoques when you do not explicitly specify the engine parameter, it defaults to None. In such case, there is a series of logic that determines the most appropriate engine for your document. Please check this link for more details - pandas.pydata.org/docs/reference/api/pandas.read_excel.htmlEvacuate
A
5

Another way to do it:

import openpyxl 
workbook_errors = openpyxl.Workbook()
workbook_errors = openpyxl.load_workbook(file_errors_location)
Atlante answered 7/10, 2020 at 19:44 Comment(0)
V
2

For those who don't want to use openpyxl due to:

  • have openpyxl running super slow;
  • find openpyxl too complicated;
  • just want to load a worksheet as a dataframe;

xlwings is recommended (install by pip install xlwings), below is a sample:

import xlwings as xw
import pandas as pd

wb_path = ".\\***.xlsx" # the .xlsx file is in the same directory as the .py script
ws_name = "***" # the name of the worksheet to load data

wb = xw.Book(wb_path)
ws = wb.sheets[ws_name]

MAX_ROW = 100 # the rows to read, this can be larger, the empty rows can be dropped in Pandas dataframe
MAX_COL = 40 # the columns to read

data = ws[:MAX_ROW,:MAX_COL].value
df = pd.DataFrame(data)

By above codes, the worksheet can be loaded into a Pandas dataframe way faster than openpyxl does.

Voguish answered 5/2 at 2:16 Comment(1)
When I run this code, I encounter the error Local application 'Microsoft Excel.app' not found. If I need Excel installed on my machine to use this library, then I'd rather use pandas.read_excel()Herzegovina

© 2022 - 2024 — McMap. All rights reserved.