I would like to be able to access data on a google sheet when running python code via cloud composer; this is something I know how to do in several ways when running code locally, but moving to the cloud is proving challenging. In particular I wish to authenticate as the composer service account rather than stashing the contents of a client_secret.json file somewhere (be that the source code or some cloud location).
For essentially the same question but instead accessing google cloud platform services, this has been relatively easy (even when running through composer) thanks to the google-cloud_* libraries. For instance, I have verified that I can push data to bigquery:
from google.cloud import bigquery
client = bigquery.Client()
client.project='test project'
dataset_id = 'test dataset'
table_id = 'test table'
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
table = client.get_table(table_ref)
rows_to_insert = [{'some_column':'test string'}]
errors = client.insert_rows(table,rows_to_insert)
and the success or failure of this can be managed through sharing (or not) 'test dataset' with the composer service account.
Similarly, getting data from a cloud storage bucket works fine:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('test bucket')
name = 'test.txt'
data_blob = bucket.get_blob(name)
data_pre = data_blob.download_as_string()
and once again I have the ability to control access through IAM.
However, for working with google sheets it seems I must resort to the Google APIs python client, and here I run into difficulties. Most documentation on this (which seems to be a moving target!) assumes local code execution, starting with the creation and storage of a client_secret.json file example 1, example 2, which I understand locally but doesn't make sense for a shared cloud environment with source control. So, a couple of approaches I've tried instead:
Trying to build credentials using discovery and oauth2
from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client.contrib import gce
SAMPLE_SPREADSHEET_ID = 'key for test sheet'
SAMPLE_RANGE_NAME = 'test range'
creds = gce.AppAssertionCredentials(scope='https://www.googleapis.com/auth/spreadsheets')
service = build('sheets', 'v4', http = creds.authorize(Http()))
sheet = service.spreadsheets()
result = sheet.values().get(spreadsheetId=SAMPLE_SPREADSHEET_ID,
range=SAMPLE_RANGE_NAME).execute()
values = result.get('values', [])
Caveat: I know nothing about working with scopes to create credential objects via Http. But this seems closest to working: I get an HTTP403 error of
'Request had insufficient authentication scopes.'
However, I don't know if that means I successfully presented myself as the service account, which was then deemed unsuitable for access (so I need to mess around with permissions some more); or didn't actually get that far (and need to fix this credentials creation process).
Getting a credential object with google.auth and passing to gspread
My (limited) understanding is that oauth2client is being deprecated and google.auth is now the way to go. This yields credentials objects in a similarly simple way to my successful examples above for cloud platform services, that I hoped I could just pass to gspread:
import gspread
from google.auth import compute_engine
credentials = compute_engine.Credentials()
client = gspread.authorize(credentials)
Sadly, gspread doesn't work with these objects, because they don't have the attributes it expects:
AttributeError: 'Credentials' object has no attribute 'access_token'
This is presumably because gspread expects oauth2 credentials and those chucked out by google.auth aren't sufficiently compatible. The gspread docs also go down the 'just get a client_secret file'... but presumably if I can get the previous (oauth/http-based) approach to work, I could then use gspread for data retrieval. For now, though, a hybrid of these two approaches stumbles in the same way: a permission denied response due to insufficient authentication scopes.
So, whether using google.auth, oauth2 (assuming that'll stick around for a while) or some other cloud-friendly approach (i.e. not one based on storing the secret key), how can I obtain suitable credentials in a cloud composer environment to make calls to the google sheets API? Bonus marks for a way that is compatible with gspread (and hence gspread_dataframe), but this is not essential. Also happy to hear that this is a PEBCAK error and I just need to configure IAM permissions differently for my current approach to work.