Download a spreadsheet from Google Drive / Workspace using Python
Asked Answered
W

13

40

Can you produce a Python example of how to download a Google Sheets spreadsheet given its key and worksheet ID (gid)? I can't.

I've scoured versions 1, 2 and 3 of the API. I'm having no luck, I can't figure out their compilcated ATOM-like feeds API, the gdata.docs.service.DocsService._DownloadFile private method says that I'm unauthorized, and I don't want to write an entire Google Login authentication system myself. I'm about to stab myself in the face due to frustration.

I have a few spreadsheets and I want to access them like so:

username = '[email protected]'
password = getpass.getpass()

def get_spreadsheet(key, gid=0):
    ... (help!) ...

for row in get_spreadsheet('5a3c7f7dcee4b4f'):
    cell1, cell2, cell3 = row
    ...

Please save my face.


Update 1: I've tried the following, but no combination of Download() or Export() seems to work. (Docs for DocsService here)

import gdata.docs.service
import getpass
import os
import tempfile
import csv

def get_csv(file_path):
  return csv.reader(file(file_path).readlines())

def get_spreadsheet(key, gid=0):
  gd_client = gdata.docs.service.DocsService()
  gd_client.email = '[email protected]'
  gd_client.password = getpass.getpass()
  gd_client.ssl = False
  gd_client.source = "My Fancy Spreadsheet Downloader"
  gd_client.ProgrammaticLogin()

  file_path = tempfile.mktemp(suffix='.csv')
  uri = 'http://docs.google.com/feeds/documents/private/full/%s' % key
  try:
    entry = gd_client.GetDocumentListEntry(uri)

    # XXXX - The following dies with RequestError "Unauthorized"
    gd_client.Download(entry, file_path)

    return get_csv(file_path)
  finally:
    try:
      os.remove(file_path)
    except OSError:
      pass
Warehouseman answered 20/7, 2010 at 6:52 Comment(2)
Could this be your problem?Hanghangar
Tried it -- thanks! Seems to help get me past login problems.Warehouseman
E
20

In case anyone comes across this looking for a quick fix, here's another (currently) working solution that doesn't rely on the gdata client library:

#!/usr/bin/python

import re, urllib, urllib2

class Spreadsheet(object):
    def __init__(self, key):
        super(Spreadsheet, self).__init__()
        self.key = key

class Client(object):
    def __init__(self, email, password):
        super(Client, self).__init__()
        self.email = email
        self.password = password

    def _get_auth_token(self, email, password, source, service):
        url = "https://www.google.com/accounts/ClientLogin"
        params = {
            "Email": email, "Passwd": password,
            "service": service,
            "accountType": "HOSTED_OR_GOOGLE",
            "source": source
        }
        req = urllib2.Request(url, urllib.urlencode(params))
        return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]

    def get_auth_token(self):
        source = type(self).__name__
        return self._get_auth_token(self.email, self.password, source, service="wise")

    def download(self, spreadsheet, gid=0, format="csv"):
        url_format = "https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=%s&exportFormat=%s&gid=%i"
        headers = {
            "Authorization": "GoogleLogin auth=" + self.get_auth_token(),
            "GData-Version": "3.0"
        }
        req = urllib2.Request(url_format % (spreadsheet.key, format, gid), headers=headers)
        return urllib2.urlopen(req)

if __name__ == "__main__":
    import getpass
    import csv

    email = "" # (your email here)
    password = getpass.getpass()
    spreadsheet_id = "" # (spreadsheet id here)

    # Create client and spreadsheet objects
    gs = Client(email, password)
    ss = Spreadsheet(spreadsheet_id)

    # Request a file-like object containing the spreadsheet's contents
    csv_file = gs.download(ss)

    # Parse as CSV and print the rows
    for row in csv.reader(csv_file):
        print ", ".join(row)
Eastnortheast answered 25/1, 2012 at 16:22 Comment(7)
awesome—a nice reusable example!Schilit
This should be the chosen solution. Removing the gdata dependency is a huge plus and you even packaged it in a way that it could be used as both a console app and a library. I added exception handling around the get_auth_token request but only so I could output a meaningful message when an incorrect password is encountered.Blackington
Is there a reason to loop over the csv_file object? Why not just print csv_file.read() - csv in, csv out!Haehaecceity
@Haehaecceity the csv.reader bit is included mainly as an example of how to parse the spreadsheet once it's downloaded.Eastnortheast
The ClientLogin method has been withdrawn now hasn't it? So any solution would have to be OAth2 - is there any way of avoiding that complexity?Imparadise
developers.google.com/identity/protocols/… as Francis said, this doesn't work anymore.Rustproof
This example will NO LONGER WORK, as the "ClientLogin" interface has been disabled. developers.google.com/identity/protocols/AuthForInstalledAppsLewanna
L
38

The https://github.com/burnash/gspread library is a newer, simpler way to interact with Google Spreadsheets, rather than the old answers to this that suggest the gdata library which is not only too low-level, but is also overly-complicated.

You will also need to create and download (in JSON format) a Service Account key: https://console.developers.google.com/apis/credentials/serviceaccountkey

Here's an example of how to use it:

import csv
import gspread
from oauth2client.service_account import ServiceAccountCredentials

scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('credentials.json', scope)

docid = "0zjVQXjJixf-SdGpLKnJtcmQhNjVUTk1hNTRpc0x5b9c"

client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(docid)
for i, worksheet in enumerate(spreadsheet.worksheets()):
    filename = docid + '-worksheet' + str(i) + '.csv'
    with open(filename, 'wb') as f:
        writer = csv.writer(f)
        writer.writerows(worksheet.get_all_values())
Lavery answered 18/8, 2013 at 6:22 Comment(7)
Just checked it out. Looks good and I'll mark it as the default. Awesome!Warehouseman
Just used this also, great stuff! ps - you can change the method to use the doc name instead of the key which is nice.Burgee
And this just loads all worksheets. Is it possible to load worksheet by gid, not index or name?Negro
I use this method and I have this problem UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-6: ordinal not in range(128)Hovel
The email auth doesn't work anymore. So this solution don't work. You need to have OAuth 2.0 google.com/…Kure
thanks @Richard I updated this example to use an oauth2Lavery
for Python2 and non English symbol, as @MosesLiaoGZ noted before, you can use https://mcmap.net/q/37206/-unicodeencodeerror-39-ascii-39-codec-can-39-t-encode-character-u-39-xa0-39-in-position-20-ordinal-not-in-range-128Miter
E
20

In case anyone comes across this looking for a quick fix, here's another (currently) working solution that doesn't rely on the gdata client library:

#!/usr/bin/python

import re, urllib, urllib2

class Spreadsheet(object):
    def __init__(self, key):
        super(Spreadsheet, self).__init__()
        self.key = key

class Client(object):
    def __init__(self, email, password):
        super(Client, self).__init__()
        self.email = email
        self.password = password

    def _get_auth_token(self, email, password, source, service):
        url = "https://www.google.com/accounts/ClientLogin"
        params = {
            "Email": email, "Passwd": password,
            "service": service,
            "accountType": "HOSTED_OR_GOOGLE",
            "source": source
        }
        req = urllib2.Request(url, urllib.urlencode(params))
        return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]

    def get_auth_token(self):
        source = type(self).__name__
        return self._get_auth_token(self.email, self.password, source, service="wise")

    def download(self, spreadsheet, gid=0, format="csv"):
        url_format = "https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=%s&exportFormat=%s&gid=%i"
        headers = {
            "Authorization": "GoogleLogin auth=" + self.get_auth_token(),
            "GData-Version": "3.0"
        }
        req = urllib2.Request(url_format % (spreadsheet.key, format, gid), headers=headers)
        return urllib2.urlopen(req)

if __name__ == "__main__":
    import getpass
    import csv

    email = "" # (your email here)
    password = getpass.getpass()
    spreadsheet_id = "" # (spreadsheet id here)

    # Create client and spreadsheet objects
    gs = Client(email, password)
    ss = Spreadsheet(spreadsheet_id)

    # Request a file-like object containing the spreadsheet's contents
    csv_file = gs.download(ss)

    # Parse as CSV and print the rows
    for row in csv.reader(csv_file):
        print ", ".join(row)
Eastnortheast answered 25/1, 2012 at 16:22 Comment(7)
awesome—a nice reusable example!Schilit
This should be the chosen solution. Removing the gdata dependency is a huge plus and you even packaged it in a way that it could be used as both a console app and a library. I added exception handling around the get_auth_token request but only so I could output a meaningful message when an incorrect password is encountered.Blackington
Is there a reason to loop over the csv_file object? Why not just print csv_file.read() - csv in, csv out!Haehaecceity
@Haehaecceity the csv.reader bit is included mainly as an example of how to parse the spreadsheet once it's downloaded.Eastnortheast
The ClientLogin method has been withdrawn now hasn't it? So any solution would have to be OAth2 - is there any way of avoiding that complexity?Imparadise
developers.google.com/identity/protocols/… as Francis said, this doesn't work anymore.Rustproof
This example will NO LONGER WORK, as the "ClientLogin" interface has been disabled. developers.google.com/identity/protocols/AuthForInstalledAppsLewanna
D
18

You might try using the AuthSub method described in the Exporting Spreadsheets section of the documentation.

Get a separate login token for the spreadsheets service and substitue that for the export. Adding this to the get_spreadsheet code worked for me:

import gdata.spreadsheet.service

def get_spreadsheet(key, gid=0):
    # ...
    spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
    spreadsheets_client.email = gd_client.email
    spreadsheets_client.password = gd_client.password
    spreadsheets_client.source = "My Fancy Spreadsheet Downloader"
    spreadsheets_client.ProgrammaticLogin()

    # ...
    entry = gd_client.GetDocumentListEntry(uri)
    docs_auth_token = gd_client.GetClientLoginToken()
    gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
    gd_client.Export(entry, file_path)
    gd_client.SetClientLoginToken(docs_auth_token) # reset the DocList auth token

Notice I also used Export, as Download seems to give only PDF files.

Dispirited answered 30/7, 2010 at 23:28 Comment(1)
A better solution is now available since this answer was originally accepted.Lavery
C
8

(Jul 2016) All other answers are pretty much outdated or will be, either because they use GData ("Google Data") Protocol, ClientLogin, or AuthSub, all of which have been deprecated. The same is true for all code or libraries that use the Google Sheets API v3 or older.

Modern Google API access occurs using API keys (for accessing public data), OAuth2 client IDs (for accessing data owned by users), or service accounts (for accessing data owned by applications/in the cloud) primarily with the Google Cloud client libraries for GCP APIs and Google APIs Client Libraries for non-GCP APIs. For this task, it would be the latter for Python.

To make it happen your code needs authorized access to the Google Drive API, perhaps to query for specific Sheets to download, and then to perform the actual export(s). Since this is likely a common operation, I wrote a blogpost sharing a code snippet that does this for you. If you wish to pursue this even more, I've got another pair of posts along with a video that outlines how to upload files to and download files from Google Drive.

Note that there is also a Google Sheets API v4, but it's primarily for spreadsheet-oriented operations, i.e., inserting data, reading spreadsheet rows, cell formatting, creating charts, adding pivot tables, etc., not file-based request like exporting where the Drive API is the correct one to use.

I wrote a blog post that demos exporting a Google Sheet as CSV from Drive. The core part of the script:

# setup
FILENAME = 'inventory'
SRC_MIMETYPE = 'application/vnd.google-apps.spreadsheet'
DST_MIMETYPE = 'text/csv'
DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))

# query for file to export
files = DRIVE.files().list(
    q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE), orderBy='modifiedTime desc,name').execute().get('files', [])

# export 1st match (if found)
if files:
    fn = '%s.csv' % os.path.splitext(files[0]['name'].replace(' ', '_'))[0]
    print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end='')
    data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()
    if data:
        with open(fn, 'wb') as f:
            f.write(data)
        print('DONE')

To learn more about using Google Sheets with Python, see my answer for a similar question. You can also download a Sheet in XLSX and other formats supported by Drive.

If you're completely new to Google APIs, then you need to take a further step back and review these videos first:

If you already have experience with Google Workspace (formerly G Suite, Google Apps, Google "Docs") APIs and want to see more videos on using both APIs:

Cacoepy answered 11/7, 2016 at 23:10 Comment(0)
L
3

This no longer works as of gdata 2.0.1.4:

gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())

Instead, you have to do:

gd_client.SetClientLoginToken(gdata.gauth.ClientLoginToken(spreadsheets_client.GetClientLoginToken()))
Langdon answered 24/3, 2011 at 17:24 Comment(1)
can you go the other way? log in with the gd_client and then pass the spreadsheet client a token?Slip
T
3

I wrote pygsheets as an alternative to gspread, but using google api v4. It has an export method to export spreadsheet.

import pygsheets

gc = pygsheets.authorize()

# Open spreadsheet and then workseet
sh = gc.open('my new ssheet')
wks = sh.sheet1

#export as csv
wks.export(pygsheets.ExportType.CSV)
Termination answered 11/12, 2016 at 11:30 Comment(0)
B
2

The following code works in my case (Ubuntu 10.4, python 2.6.5 gdata 2.0.14)

import gdata.docs.service
import gdata.spreadsheet.service
gd_client = gdata.docs.service.DocsService()
gd_client.ClientLogin(email,password)
spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
spreadsheets_client.ClientLogin(email,password)
#...
file_path = file_path.strip()+".xls"
docs_token = gd_client.auth_token
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
gd_client.Export(entry, file_path)  
gd_client.auth_token = docs_token
Barbrabarbuda answered 7/5, 2011 at 23:0 Comment(0)
L
2

I've simplified @Cameron's answer even further, by removing the unnecessary object orientation. This makes the code smaller and easier to understand. I also edited the url, which might work better.

#!/usr/bin/python
import re, urllib, urllib2

def get_auth_token(email, password):
    url = "https://www.google.com/accounts/ClientLogin"
    params = {
        "Email": email, "Passwd": password,
        "service": 'wise',
        "accountType": "HOSTED_OR_GOOGLE",
        "source": 'Client'
    }
    req = urllib2.Request(url, urllib.urlencode(params))
    return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]

def download(spreadsheet, worksheet, email, password, format="csv"):
    url_format = 'https://docs.google.com/spreadsheets/d/%s/export?exportFormat=%s#gid=%s'

    headers = {
        "Authorization": "GoogleLogin auth=" + get_auth_token(email, password),
        "GData-Version": "3.0"
    }
    req = urllib2.Request(url_format % (spreadsheet, format, worksheet), headers=headers)
    return urllib2.urlopen(req)


if __name__ == "__main__":
    import getpass
    import csv

    spreadsheet_id = ""             # (spreadsheet id here)
    worksheet_id = ''               # (gid here)
    email = ""                      # (your email here)
    password = getpass.getpass()

    # Request a file-like object containing the spreadsheet's contents
    csv_file = download(spreadsheet_id, worksheet_id, email, password)

    # Parse as CSV and print the rows
    for row in csv.reader(csv_file):
        print ", ".join(row)
Lananna answered 12/10, 2014 at 0:45 Comment(1)
This solution is outdated and won't work, like allmost all answers here you need to use oauth2.0Kure
E
1

I'm using this: curl 'https://docs.google.com/spreadsheets/d/1-lqLuYJyHAKix-T8NR8wV8ZUUbVOJrZTysccid2-ycs/gviz/tq?tqx=out:csv' on a sheet that is set to publicly readable.

So you would need a python version of curl, if you can work with public sheets.

If you have a sheet with some tabs you don't want to reveal, create a new sheet, and import the ranges you want to publish into tabs on it.

Euglena answered 13/1, 2020 at 16:40 Comment(2)
How did you get that link with "out:csv" attached to it?Mardellmarden
Did you add it manually like answered in- #24255972Mardellmarden
M
1

Downloading a spreadsheet from google doc is pretty simple using sheets.

You can follow the detailed documentation on

https://pypi.org/project/gsheets/

or follow the below-given steps. I recommend reading through the documentation for better coverage.

  1. pip install gsheets

  2. Log in to the Google Developers Console with the Google account whose spreadsheets you want to access. Create (or select) a project and enable the Drive API and Sheets API (under Google Apps APIs).

  3. Go to the Credentials for your project and create New credentials > OAuth client ID > of type Other. In the list of your OAuth 2.0 client IDs click Download JSON for the Client ID you just created. Save the file as client_secrets.json in your home directory (user directory).

  4. Use the following code snippet.

    from gsheets import Sheets
    sheets = Sheets.from_files('client_secret.json')
    print(sheets) # will ensure authenticate connection
    
    s = sheets.get("{SPREADSHEET_URL}")
    print(s) # will ensure your file is accessible 
    
    s.sheets[1].to_csv('Spam.csv', encoding='utf-8', dialect='excel') # will download the file as csv
Mardellmarden answered 9/7, 2020 at 20:45 Comment(0)
B
0

This isn't a complete answer, but Andreas Kahler wrote up an interesting CMS solution using Google Docs + Google App Engline + Python. Not having any experience in the area, I cannot see exactly what portion of the code may be of use to you, but check it out. I know it interfaces with a Google Docs account and plays with files, so I have a feeling you'll recognize what's going on. It should at least point you in the right direction.

Google AppEngine + Google Docs + Some Python = Simple CMS

Blowup answered 30/7, 2010 at 16:58 Comment(0)
R
0

Gspread is indeed a big improvement over GoogleCL and Gdata (both of which I've used and thankfully phased out in favor of Gspread). I think that this code is even quicker than the earlier answer to get the contents of the sheet:

username = '[email protected]'
password = 'sdfsdfsadfsdw'
sheetname = "Sheety Sheet"

client = gspread.login(username, password)
spreadsheet = client.open(sheetname)

worksheet = spreadsheet.sheet1
contents = []
for rows in worksheet.get_all_values():
    contents.append(rows)
Reckoning answered 14/9, 2014 at 3:15 Comment(0)
M
0

(Mar 2019, Python 3) My data is usually not sensitive and I use usually table format similar to CSV.

In such case, one can simply publish to the web the sheet and than use it as a CSV file on a server.

(One publishes it using File -> Publish to the web ... -> Sheet 1 -> Comma separated values (.csv) -> Publish).

import csv
import io
import requests

url = "https://docs.google.com/spreadsheets/d/e/<GOOGLE_ID>/pub?gid=0&single=true&output=csv"  # you can get the whole link in the 'Publish to the web' dialog
r = requests.get(url)
r.encoding = 'utf-8'
csvio = io.StringIO(r.text, newline="")
data = []
for row in csv.DictReader(csvio):
    data.append(row)
Made answered 5/3, 2019 at 0:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.