Get column names of Excel worksheet with OpenPyXL in readonly mode

Asked 22/8, 2018 at 22:17 Answered 19/3, 2024 at 10:30

How could I retrieve

the column names (values of the cells in the first row) in an openpyxl Read-only worksheet?
- City, Population, Country in the below example worksheet
all column names in an openpyxl Read-only workbook?
- City, Population, Country, frames from worksheet 1 and the other column names from all other worksheets

Example Excel worksheet:

| City       | Population  |    Country   |
| -----------|------------ | ------------ |
| Madison    |   252,551   |     USA      |
| Bengaluru  | 10,178,000  |    India     |
| ...        |       ...   |     ...      |

Example code:

from openpyxl import load_workbook

wb = load_workbook(filename=large_file.xlsx, read_only=True)
sheet = wb.worksheets[0]

... (not sure where to go from here)

Notes:

I have to use readonly because the Excel file has over 1 million rows (don't ask)
I'd like the column names so I can eventually infer the column types and import the excel data into a PostgreSQL database

Lewes answered 22/8, 2018 at 22:17 Comment(7)

Your use of the terms "headers" is ambiguous. – Candide 23/8, 2018 at 8:54

Good point, edited the question. – Lewes 23/8, 2018 at 14:0

You're still talking about print_titles which are something different. As are headers and footers. – Candide 23/8, 2018 at 15:55

Edited the question again. – Lewes 23/8, 2018 at 17:4

So, what's the question now? [c.value for c in ws.iter_rows(min_row=1, max_row=1)] not sufficient? – Candide 24/8, 2018 at 8:40

not sure if I'm using this correctly, but print([c.value for c in ws.iter_rows(min_row=1, max_row=1)]) gave me AttributeError: 'tuple' object has no attribute 'value – Lewes 24/8, 2018 at 14:39

Try [c.value for c in next(ws.iter_rows(min_row=1, max_row=1))] – Candide 24/8, 2018 at 16:14

This will print every thing from row 1;

list_with_values=[]
for cell in ws[1]:
    list_with_values.append(cell.value)

If for some reason you want to get a list of the column letters that are filled in you can just:

column_list = [cell.column for cell in ws[1]]

For your 2nd question; Assuming you have stored the header values in a list called : "list_with_values"

from openpyxl import Workbook
wb = Workbook()
ws = wb['Sheet']
#Sheet is the default sheet name, you can rename it or create additional ones with wb.create_sheet()
ws.append(list_with_values)
wb.save('OutPut.xlsx')

Rico answered 24/8, 2018 at 7:41 Comment(3)

Thank you, the for loop does take care of my first question. I ended up creating a list and appending those cell values to the list. – Lewes 24/8, 2018 at 14:43

Glad to hear that, if my answer has fulfilled your question, please select it as the answer. – Rico 26/8, 2018 at 17:47

this takes care of the first part of my question, but do you have a solution for the second part (returning all column names in an openpyxl Read-only workbook)? – Lewes 27/8, 2018 at 13:17

Read-only mode provides fast access to any row or set of rows in a worksheet. Use the method iter_rows() to restric the selection. So to get the first row of the worksheet:

rows = ws.iter_rows(min_row=1, max_row=1) # returns a generator of rows
first_row = next(rows) # get the first row
headings = [c.value for c in first_row] # extract the values from the cells

Candide answered 23/8, 2018 at 8:56 Comment(5)

Please see the edited question. I'm interested in getting the list of column names (assumably the first row) from an Excel document, not the titles. Sorry for the confusion. – Lewes 23/8, 2018 at 17:5

By column name you mean you want to get the Letters of the column? – Rico 24/8, 2018 at 7:48

I mean the words that label the columns in an Excel worksheet. (City, Population, Country in the updated example worksheet in the question). – Lewes 24/8, 2018 at 14:42

Those are just the values of the cells in the first row. – Candide 24/8, 2018 at 16:14

Ok, thanks for clarifying, @Charlie Clark. I added that to the question to hopefully clear things up. – Lewes 24/8, 2018 at 16:42

Charlie Clarks answer compacted down to a one liner with list comprehension

    headers = [c.value for c in next(wb['sheet_name'].iter_rows(min_row=1, max_row=1))]

Kibe answered 18/1, 2021 at 20:34 Comment(0)

This is how I handled this

from openpyxl.utils import get_column_letter

def get_columns_from_worksheet(ws):
  return {
      cell.value: {
          'letter': get_column_letter(cell.column),
          'number': cell.column - 1
      } for cell in ws[1] if cell.value
  }

An Example of this being used would be

from openpyxl import load_workbook

wb = load_workbook(filename='my_file.xlsx')
ws = wb['MySheet']

COLUMNS = get_columns_from_worksheet(ws)

for cell in ws[COLUMNS['MY Named Column']['letter']]:
    print(cell.value)

The main reason for capturing both the letter and number code is because different functions and patterns within openpyxl use either the number or the letter so having reference to both is invaluable

Jablonski answered 4/8, 2021 at 21:11 Comment(0)

How could I retrieve all column names in an openpyxl Read-only workbook

Assuming, that the filename is 'large_file.xlsx' and the data is on first sheet (index = 0) I would do something like this (tested in Python 3.12 and openpyxl 3.1.2):

You can either iterate through first row cells (sheet.iter_rows()):

from openpyxl import load_workbook

workbook = load_workbook(filename='large_file.xlsx', data_only=True)
sheet = workbook.worksheets[0]
headers = [cell.value for cell in next(sheet.iter_rows(min_row=1, max_row=1))]
print(headers)

or simply select first row (sheet[int]):

workbook = load_workbook(filename='large_file.xlsx', data_only=True)
sheet = workbook.worksheets[0]
headers = [cell.value for cell in sheet[1]]
print(headers)

Bronson answered 19/3, 2024 at 10:30 Comment(0)

Recommended topics

Hot tags