Get column names of Excel worksheet with OpenPyXL in readonly mode
Asked Answered
L

5

15

How could I retrieve

  1. the column names (values of the cells in the first row) in an openpyxl Read-only worksheet?
    • City, Population, Country in the below example worksheet
  2. all column names in an openpyxl Read-only workbook?
    • City, Population, Country, frames from worksheet 1 and the other column names from all other worksheets

Example Excel worksheet:

| City       | Population  |    Country   |
| -----------|------------ | ------------ |
| Madison    |   252,551   |     USA      |
| Bengaluru  | 10,178,000  |    India     |
| ...        |       ...   |     ...      |

Example code:

from openpyxl import load_workbook

wb = load_workbook(filename=large_file.xlsx, read_only=True)
sheet = wb.worksheets[0]

... (not sure where to go from here)

Notes:

  • I have to use readonly because the Excel file has over 1 million rows (don't ask)
  • I'd like the column names so I can eventually infer the column types and import the excel data into a PostgreSQL database
Lewes answered 22/8, 2018 at 22:17 Comment(7)
Your use of the terms "headers" is ambiguous.Candide
Good point, edited the question.Lewes
You're still talking about print_titles which are something different. As are headers and footers.Candide
Edited the question again.Lewes
So, what's the question now? [c.value for c in ws.iter_rows(min_row=1, max_row=1)] not sufficient?Candide
not sure if I'm using this correctly, but print([c.value for c in ws.iter_rows(min_row=1, max_row=1)]) gave me AttributeError: 'tuple' object has no attribute 'valueLewes
Try [c.value for c in next(ws.iter_rows(min_row=1, max_row=1))]Candide
R
15

This will print every thing from row 1;

list_with_values=[]
for cell in ws[1]:
    list_with_values.append(cell.value)

If for some reason you want to get a list of the column letters that are filled in you can just:

column_list = [cell.column for cell in ws[1]]

For your 2nd question; Assuming you have stored the header values in a list called : "list_with_values"

from openpyxl import Workbook
wb = Workbook()
ws = wb['Sheet']
#Sheet is the default sheet name, you can rename it or create additional ones with wb.create_sheet()
ws.append(list_with_values)
wb.save('OutPut.xlsx')
Rico answered 24/8, 2018 at 7:41 Comment(3)
Thank you, the for loop does take care of my first question. I ended up creating a list and appending those cell values to the list.Lewes
Glad to hear that, if my answer has fulfilled your question, please select it as the answer.Rico
this takes care of the first part of my question, but do you have a solution for the second part (returning all column names in an openpyxl Read-only workbook)?Lewes
C
6

Read-only mode provides fast access to any row or set of rows in a worksheet. Use the method iter_rows() to restric the selection. So to get the first row of the worksheet:

rows = ws.iter_rows(min_row=1, max_row=1) # returns a generator of rows
first_row = next(rows) # get the first row
headings = [c.value for c in first_row] # extract the values from the cells
Candide answered 23/8, 2018 at 8:56 Comment(5)
Please see the edited question. I'm interested in getting the list of column names (assumably the first row) from an Excel document, not the titles. Sorry for the confusion.Lewes
By column name you mean you want to get the Letters of the column?Rico
I mean the words that label the columns in an Excel worksheet. (City, Population, Country in the updated example worksheet in the question).Lewes
Those are just the values of the cells in the first row.Candide
Ok, thanks for clarifying, @Charlie Clark. I added that to the question to hopefully clear things up.Lewes
K
5

Charlie Clarks answer compacted down to a one liner with list comprehension

    headers = [c.value for c in next(wb['sheet_name'].iter_rows(min_row=1, max_row=1))]
Kibe answered 18/1, 2021 at 20:34 Comment(0)
J
4

This is how I handled this

from openpyxl.utils import get_column_letter

def get_columns_from_worksheet(ws):
  return {
      cell.value: {
          'letter': get_column_letter(cell.column),
          'number': cell.column - 1
      } for cell in ws[1] if cell.value
  }

An Example of this being used would be

from openpyxl import load_workbook

wb = load_workbook(filename='my_file.xlsx')
ws = wb['MySheet']

COLUMNS = get_columns_from_worksheet(ws)

for cell in ws[COLUMNS['MY Named Column']['letter']]:
    print(cell.value)

The main reason for capturing both the letter and number code is because different functions and patterns within openpyxl use either the number or the letter so having reference to both is invaluable

Jablonski answered 4/8, 2021 at 21:11 Comment(0)
B
0

How could I retrieve all column names in an openpyxl Read-only workbook

Assuming, that the filename is 'large_file.xlsx' and the data is on first sheet (index = 0) I would do something like this (tested in Python 3.12 and openpyxl 3.1.2):

You can either iterate through first row cells (sheet.iter_rows()):

from openpyxl import load_workbook

workbook = load_workbook(filename='large_file.xlsx', data_only=True)
sheet = workbook.worksheets[0]
headers = [cell.value for cell in next(sheet.iter_rows(min_row=1, max_row=1))]
print(headers)

or simply select first row (sheet[int]):

workbook = load_workbook(filename='large_file.xlsx', data_only=True)
sheet = workbook.worksheets[0]
headers = [cell.value for cell in sheet[1]]
print(headers)
Bronson answered 19/3, 2024 at 10:30 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.