Read cell format from Google sheet using Python (API v4)
Asked Answered
A

1

5

I'm looking for a way to read a cell's formatting from a Google sheet in Python, specifically, its background colour.

Two popular packages I've found to be able to read a sheet are gspread (fork) and pygsheets. I tried both and they work well in reading my sheet data, however from what I can see neither supports reading cell formats, only setting them. This open issue on pygsheets' GitHub page describes the kind of functionality I need.

Essentially, where each row is a record that has timestamp, username, comments etc., I want to find all the rows by a particular username and only those that don't have a red background, sort of like this:

if ("Username" in record.values()) and (matching_cells.background != red):
        # Do something

Thanks!

Acne answered 12/7, 2018 at 9:52 Comment(4)
Use the Sheets REST API directly? Or with Google's client library?Cur
@Cur I'm looking into it but I haven't been able to come to grips with the documentation as well as the other two -- is this possible with Google's client library? Would you be able to point me in the right direction?Acne
spreadsheets.get, request the appropriate fields. Note that the Python API Explorer can be extremely helpful: developers.google.com/apis-explorer/#p/sheets/v4/… (it has an interactive and explanatory fields selector!)Cur
That looks very interesting, I’ll definitely check it out. Thank you @tehhowch!Acne
C
6

Using google-api-python-client to obtain your authorized sheets API client, you can request the spreadsheet data with the service.spreadsheets().get method:

def get_sheet_colors(service, wbId: str, ranges: list):
    params = {'spreadsheetId': wbId,
              'ranges': ranges,
              'fields': 'sheets(data(rowData(values(effectiveFormat/backgroundColor,formattedValue)),startColumn,startRow),properties(sheetId,title))'}
    return service.spreadsheets().get(**params).execute()

desiredA1NotationRanges = ['\'Some Arbitrary Sheet 1\'!A1:K', '\'Some Other Arbitary Sheet\'!B2:D4']
all_data = get_sheet_colors(get_authed_service_somehow(), mySpreadsheetId, desiredA1NotationRanges))
# all_data is a dict with keys determined by the fields in the request
# (i.e. "sheets") and the output of the API method used (aka consult your API reference)

The code below uses the API response from above and makes two arrays, one with background colors and one with the cell values, and ensures that row & column indices are portable by prefixing rows and columns to ensure the data starts from cell A1, even if you requested a range like "C3:J5". This is provided as an example for converting the REST resource into more familiar types, and is not intended to be useful in a generic sense.

dataset = []
default_bg = {'red': 1, 'green': 1, 'blue': 1}
# all_data['sheets'] is a list of sheet resources (per the API spec.)
for sheet in all_data['sheets']:
    # The sheet resource is a dict with keys determined by what we requested in fields
    # (i.e. properties (->sheetId, ->title), data)
    print('Sheet name is {title} with grid id {sheetId}'.format_map(sheet["properties"]))
    # each range in data will only contain startRow and/or startColumn if they are not 0
    # (i.e. if you grab A1:___, you won't have startRow or startColumn)
    for range in sheet['data']:
        rowData = range.get('rowData', [])
        if not rowData:
            continue
        offsets = {'row': range.get('startRow', 0),
                   'col': range.get('startColumn', 0)}
        rangeBGs = [default_bg] * offsets['row']
        rangeValues = [''] * offsets['row']
        for row in rowData:
            colData = row['values']
            newBGs = [default_bg] * offsets['col']
            newVals = [''] * offsets['col']
            for col in colData:
                try:
                    newBGs.append(col['effectiveFormat']['backgroundColor'])
                except KeyError:
                    newBGs.append(default_bg) # Shouldn't get called (all cells have a background)
                try:
                    newVals.append(col['formattedValue']) # Always a string if present.
                except KeyError:
                    newVals.append('') # Not all cells have a value.
            rangeBGs.append(newBGs)
            rangeValues.append(newVals)
        dataset.append({'sheetId': sheet['properties']['sheetId'],
                        'sheetName': sheet['properties']['title'],
                        'backgrounds': rangeBGs,
                        'values': rangeValues})
# dataset is now a list with elements that correspond to the requested ranges,
# and contain 0-base row and column indexed arrays of the backgrounds and values.
# One could add logic to pop elements from the ranges if the entire row has no values.
# Color in A1 of 1st range:
r1 = dataset[0]
print(f'Cell A1 color is {r1["backgrounds"][0][0]} and has value {r1["values"][0][0]}')
print(f'Cell D2 color is {r1["backgrounds"][3][1]} and has value {r1["values"][3][1]}')

References:

Cur answered 12/7, 2018 at 20:0 Comment(5)
Formulating the fields piece is the hardest...Spherics
Which of those ref-doc links, did you use to formulate the fields, anyway? I'm not finding it anywhere.Us
To give a partial-answer to my own question, by using this question Retrieving Multiple Fonts, and doing "fields=sheets", I got back the entire set of values being used in the cell-range, and so could find the ones I wanted. It would still be nice to know where it is in the official docs, granted.Us
@JohnC developers.google.com/tasks/performance?hl=en#fields-syntaxCur
Under "improving performance"? That was going to be my third guess... :) Anyway, thanks. Although it mainly has the syntax, it does mention that wildcards can be used, so presumably "fields=*" would have given me everything, similar to "fields=sheets".Us

© 2022 - 2024 — McMap. All rights reserved.