How can I obtain suitable credentials in a cloud composer environment to make calls to the google sheets API?
Asked Answered
E

1

6

I would like to be able to access data on a google sheet when running python code via cloud composer; this is something I know how to do in several ways when running code locally, but moving to the cloud is proving challenging. In particular I wish to authenticate as the composer service account rather than stashing the contents of a client_secret.json file somewhere (be that the source code or some cloud location).

For essentially the same question but instead accessing google cloud platform services, this has been relatively easy (even when running through composer) thanks to the google-cloud_* libraries. For instance, I have verified that I can push data to bigquery:

from google.cloud import bigquery
client = bigquery.Client()

client.project='test project'
dataset_id = 'test dataset'
table_id = 'test table'

dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
table = client.get_table(table_ref)

rows_to_insert = [{'some_column':'test string'}]
errors = client.insert_rows(table,rows_to_insert)

and the success or failure of this can be managed through sharing (or not) 'test dataset' with the composer service account.

Similarly, getting data from a cloud storage bucket works fine:

from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('test bucket')
name = 'test.txt'
data_blob = bucket.get_blob(name)
data_pre = data_blob.download_as_string()

and once again I have the ability to control access through IAM.

However, for working with google sheets it seems I must resort to the Google APIs python client, and here I run into difficulties. Most documentation on this (which seems to be a moving target!) assumes local code execution, starting with the creation and storage of a client_secret.json file example 1, example 2, which I understand locally but doesn't make sense for a shared cloud environment with source control. So, a couple of approaches I've tried instead:

Trying to build credentials using discovery and oauth2

from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client.contrib import gce

SAMPLE_SPREADSHEET_ID = 'key for test sheet'
SAMPLE_RANGE_NAME = 'test range'

creds = gce.AppAssertionCredentials(scope='https://www.googleapis.com/auth/spreadsheets')
service = build('sheets', 'v4', http = creds.authorize(Http()))

sheet = service.spreadsheets()
result = sheet.values().get(spreadsheetId=SAMPLE_SPREADSHEET_ID,
                            range=SAMPLE_RANGE_NAME).execute()
values = result.get('values', [])

Caveat: I know nothing about working with scopes to create credential objects via Http. But this seems closest to working: I get an HTTP403 error of

'Request had insufficient authentication scopes.'

However, I don't know if that means I successfully presented myself as the service account, which was then deemed unsuitable for access (so I need to mess around with permissions some more); or didn't actually get that far (and need to fix this credentials creation process).

Getting a credential object with google.auth and passing to gspread

My (limited) understanding is that oauth2client is being deprecated and google.auth is now the way to go. This yields credentials objects in a similarly simple way to my successful examples above for cloud platform services, that I hoped I could just pass to gspread:

import gspread
from google.auth import compute_engine

credentials = compute_engine.Credentials()
client = gspread.authorize(credentials)

Sadly, gspread doesn't work with these objects, because they don't have the attributes it expects:

AttributeError: 'Credentials' object has no attribute 'access_token'

This is presumably because gspread expects oauth2 credentials and those chucked out by google.auth aren't sufficiently compatible. The gspread docs also go down the 'just get a client_secret file'... but presumably if I can get the previous (oauth/http-based) approach to work, I could then use gspread for data retrieval. For now, though, a hybrid of these two approaches stumbles in the same way: a permission denied response due to insufficient authentication scopes.

So, whether using google.auth, oauth2 (assuming that'll stick around for a while) or some other cloud-friendly approach (i.e. not one based on storing the secret key), how can I obtain suitable credentials in a cloud composer environment to make calls to the google sheets API? Bonus marks for a way that is compatible with gspread (and hence gspread_dataframe), but this is not essential. Also happy to hear that this is a PEBCAK error and I just need to configure IAM permissions differently for my current approach to work.

Echovirus answered 2/1, 2019 at 18:52 Comment(3)
try #19767412 to obtain the necessary access token, and then use the raw HTTP REST Spreadsheet API so you aren't in deprecated library hell.Cladoceran
This still requires embedding credentials in the source, risking privilege escalation by anyone with access to it. (Whereas by the route I'd like to take I can restrict who has composer permissions for the relevant project, and the same code in another project would present as a different service account which won't have permissions to the sheet.)Echovirus
there is no requirement for credentials to be hardcoded in the source. They can be stored in a secure database. My example only embeds the creds for simplicity to illustrate the concept.Cladoceran
S
3

It looks like your Composer environment oauthScopes config wasn't set up properly. If left unspecified, the default cloud-platform doesn't allow you to access Google sheets API. You may want to create a new Composer environment with oauthScopes = [ "https://www.googleapis.com/auth/spreadsheets", "https://www.googleapis.com/auth/cloud-platform"].

Google sheets API reference: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/create.

Stile answered 6/1, 2019 at 20:18 Comment(3)
Great, this got me where I needed to be (and I never would have looked into the environment config). For the benefit of anyone else, the formatting is quite fussy; although the docs use the list format above that seems to be for config via json. On cloud shell it's --oauth-scopes 'googleapis.com/Auth/spreadsheets','https://…' i.e. no square brackets.Echovirus
@GrayTaylor so you end up using the google.auth module after adding this oauth scope?Giovanna
@DanielLavedoniodeLima with the scopes set, the code in "Trying to build credentials using discovery and oauth2" worked. This would suffice: from oauth2client.contrib import gce creds = gce.AppAssertionCredentials(scope='googleapis.com/auth/spreadsheets') client = gspread.authorize(creds)Echovirus

© 2022 - 2024 — McMap. All rights reserved.