Using Training TFRecords that are stored on Google Cloud
Asked Answered
S

1

13

My goal is to use training data (format: tfrecords) stored on Google Cloud storage when I run my Tensorflow Training App, locally. (Why locally? : I am testing before I turn it into a training package for Cloud ML)

Based on this thread I shouldn't have to do anything since the underlying Tensorflow API's should be able to read a gs://(url)

However thats not the case and the errors I see are of the format:

2017-06-06 15:38:55.589068: I tensorflow/core/platform/cloud/retrying_utils.cc:77] The operation failed and will be automatically retried in 1.38118 seconds (attempt 1 out of 10), caused by: Unavailable: Error executing an HTTP request (HTTP response code 0, error code 6, error message 'Couldn't resolve host 'metadata'')

2017-06-06 15:38:56.976396: I tensorflow/core/platform/cloud/retrying_utils.cc:77] The operation failed and will be automatically retried in 1.94469 seconds (attempt 2 out of 10), caused by: Unavailable: Error executing an HTTP request (HTTP response code 0, error code 6, error message 'Couldn't resolve host 'metadata'')

2017-06-06 15:38:58.925964: I tensorflow/core/platform/cloud/retrying_utils.cc:77] The operation failed and will be automatically retried in 2.76491 seconds (attempt 3 out of 10), caused by: Unavailable: Error executing an HTTP request (HTTP response code 0, error code 6, error message 'Couldn't resolve host 'metadata'')

I'm not able to follow where I have to begin debugging this error.

Here is a snippet that reproduced the problem and also shows the tensorflow API's that I am using.

def _preprocess_features(features):
        """Function that returns preprocessed images"""

def _parse_single_example_from_tfrecord(value):
    features = (
        tf.parse_single_example(value,
                                features={'image_raw': tf.FixedLenFeature([], tf.string),
                                          'label': tf.FixedLenFeature([model_config.LABEL_SIZE], tf.int64)
                                          })
        )
    return features

def _read_and_decode_tfrecords(filename_queue):
    reader = tf.TFRecordReader()
    # Point it at the filename_queue
    _, value = reader.read(filename_queue)
    features = _parse_single_example_from_tfrecord(value)
    # decode the binary string image data
    image, label = _preprocess_features(features)
    return image, label

def test_tfread(filelist):
  train_filename_queue = (
    tf.train.string_input_producer(filelist,
                                   num_epochs=None,
                                   shuffle=True))
  image, label = (
    _read_and_decode_tfrecords(train_filename_queue))
  return image

images= test_tfread(["gs://test-bucket/t.tfrecords"])
sess = tf.Session(config=tf.ConfigProto(
                allow_soft_placement=True,
                log_device_placement=True))
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
  for step in range(model_config.MAX_STEPS):
      _ = sess.run([images])
finally:
  # When done, ask the threads to stop.
  coord.request_stop()
# Finally, wait for them to join (i.e. cleanly shut down)
coord.join(threads)
Suffocate answered 6/6, 2017 at 23:11 Comment(0)
M
27

Try executing the following command

gcloud auth application-default login

Minuet answered 7/6, 2017 at 1:9 Comment(2)
Awesome! This works - Thanks! Could you also tell me how I can now have my training package do this programmatically when it is deployed on Google Cloud ML ?Suffocate
That shouldn't be necessary when running on CloudML. If you're having problems, please report those. (But, FWIW, you can always use subprocess.check_call(["gcloud", "auth", "application-default', "login"]))Minuet

© 2022 - 2024 — McMap. All rights reserved.