How to run Google gsutil using Python
Asked Answered
S

2

14

After installing and configuring Google Cloud SDK gsutil command can be run by simply typing its name and the argument(-s) using Windows cmd.

Here is the example:

"C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin\gcloud" version

enter image description here

But the same command fails if run using Python subprocess. With subprocess's shell argument set to True the ImportError occurs:

import subprocess

cmd = '"C:/Program Files (x86)/Google/Cloud SDK/google-cloud-sdk/bin/gsutil" version'

p = subprocess.Popen(cmd, shell=True)

.....

ImportError: No module named site

With subprocess's shell argument set to False then the WindowsError: [Error 2] The system cannot find the file specified occurs:

p = subprocess.Popen(cmd, shell=False)

Is there a way to run gsutil on Windows using Python?

Subrogate answered 16/4, 2018 at 21:10 Comment(1)
If you still want to use command line, then you have to specify the full name of the file, i.e., gsutil.cmdJosiah
W
13

Note that the proper and official way to interact with Google Cloud Storage is to make use of the Google Cloud Client Library for Python and not running the gsutil command through subprocess.Popen. If you are not setting up merely some tests I would suggest you to follow from the beginning this way if there is not any technological constrain that makes this way impracticable.

You can check at the following links the relative Overview and Documentation. A small example taken from the Documentation can be the following:

from google.cloud import storage

client = storage.Client()
bucket = client.get_bucket('<your-bucket-name>')
blob = bucket.blob('my-test-file.txt')
blob.upload_from_string('this is test content!')

You can find a further example here using google-cloud-python with the Datastore and Cloud Storage to manage expenses.

Willi answered 17/4, 2018 at 11:21 Comment(6)
The python API does not allow use of the -m option for parallelism, as far as I know. So there are reasons for using subprocess and the gsutil command.Calc
@UricSou: You can share client instances across threads because the storage client uses the requests library. Just create client instances after multiprocessing.Pool.Mixed
Also, the python API is dead slow comapred to the commandlineHylo
@Willi how to upload folder using above approach?Belenbelesprit
I highly recommend using gsutil and not the native python library for anything more than downloading one or two small files. gsutil's performance is 100x better, so the original direction of the question was good (invoking it from python)Ivonneivor
I also see benefit of using the gsutil command line tool for that. As it has really easy to use interace - allowing to used things like rm -r, cp -r (which you can't easily replicate using the other methods - you would need to iterate over all the objects)Tunny
W
1

Use shutil.which to get the full path to gsutil in a cross-platform manner:

import shutil

# If gsutil is in PATH:
path = shutil.which('gsutil')

# Or if gsutil isn't in PATH but you know where it is:
path = shutil.which('gsutil', path="C:/Program Files (x86)/Google/Cloud SDK/google-cloud-sdk/bin")

# Then you can use that path to run it.
import subprocess
cmd = [path, "version"]
p = subprocess.Popen(cmd)
Whang answered 5/10, 2022 at 20:42 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.