Listing all public links for all objects in a bucket using gsutil
Asked Answered
C

2

7

Is there a way to list all public links for all the objects stored into a Google Cloud Storage bucket (or a directory in a bucket) using Cloud SDK's gsutil or gcloud?

Something like:

$ gsutil ls --public-link gs://my-bucket/a-directory

Cumbrance answered 21/9, 2016 at 6:59 Comment(0)
L
8

Public links for publicly visible objects are predictable. They just match this pattern: https://storage.googleapis.com/BUCKET_NAME/OBJECT_NAME.

gsutil doesn't have a command to print URLs for objects in a bucket, but it can just list objects. You could pipe that to a program like sed to replace those listings with object names. For example:

gsutil ls gs://pub/** | sed 's|gs://|https://storage.googleapis.com/|'

The downside here is that this would produce links to all resources, not just those that are publicly visible. So you'd need to either know which resources are publicly visible, or you'd need to write a more elaborate filter based on gsutil ls -L.

Lacking answered 21/9, 2016 at 7:23 Comment(2)
Couple years after and the solution is still the same I believe. Just wanted to add that we could avoid escaping characters in the expression passed to sed by using 's|gs://|https://storage.googleapis.com/|'Scurvy
also, there's a double sed on the exampleDiscalced
S
0

Even though the question is about a possible flag passed to gsutil to achieve the desired result and since there isn't one at the moment, I'd like to post another programmatic approach using a Cloud Storage Client Library that could be extended and/or adapted to Python modules.

Is as follows (the only third party dependency is google-cloud-storage):

python3 -c """
from operator import attrgetter
from pathlib import Path
import sys

from google.cloud import storage

url = Path(sys.argv[1]) #a blob with the objects we want...

bucket = storage.Client().bucket(url.parent.name)

urls = tuple(map(attrgetter('public_url'), filter(lambda blob:not blob.name.endswith('/'), bucket.list_blobs(prefix=url.name)))) # TODO improve this as not only excludes self blob as homologous 'folder' abstraction blobs inside

print('\n'.join(urls))
""" gs://my-bucket/a-directory
Scurvy answered 29/9, 2019 at 12:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.