Reading Input Data from GCS
Asked Answered
W

1

1

What is the suggest way of loading data from GCS? The sample code shows copying the data from GCS to the /tmp/ directory. If this is the suggest approach, how much data may be copied to /tmp/?

Weathersby answered 30/9, 2016 at 3:27 Comment(0)
S
4

While you have that option, you shouldn't need to copy the data over to local disk. You should be able to reference training and evaluation data directly from GCS, by referencing your files/objects using their GCS URI -- eg. gs://bucket/path/to/file. You can use these paths where you'd normally use local file system paths in TensorFlow APIs that accept file paths. TensorFlow supports the ability to access data (and write to) GCS.

You should also be able to use a prefix to reference a set of matching files, rather than referencing each file individually.

Followup note -- you'll want to check out https://cloud.google.com/ml/docs/how-tos/using-external-buckets in case you need to appropriately ACL your data for being accessible to training.

Hope that helps.

Seismic answered 30/9, 2016 at 5:50 Comment(4)
This seems like not: github.com/tensorflow/tensorflow/blob/…Weathersby
Here's where some of the switching to the GCS file system implementation happens: github.com/tensorflow/tensorflow/blob/…Seismic
Yea, but is that code path taken in the case of a tf.read_file node?Weathersby
I believe it should -- there was a bunch of work done to abstract file io and file systems, so there all the io functionality works consistently. The best way to be sure is to try... and if it doesn't it sounds like a bug we should fix!Seismic

© 2022 - 2024 — McMap. All rights reserved.