reading files in google cloud machine learning
Asked Answered
M

1

7

I tried to run tensorflow-wavenet on the google cloud ml-engine with gcloud ml-engine jobs submit training but the cloud job crashed when it was trying to read the json configuration file:

with open(args.wavenet_params, 'r') as f:
   wavenet_params = json.load(f)

arg.wavenet_params is simply a file path to a json file which I uploaded to the google cloud storage bucket. The file path looks like this: gs://BUCKET_NAME/FILE_PATH.json.

I double-checked that the file path is correct and I'm sure that this part is responsible for the crash since I commented out everything else.

The crash log file doesn't give much information about what has happened:

Module raised an exception for failing to call a subprocess Command '['python', '-m', u'gcwavenet.train', u'--data_dir', u'gs://wavenet-test-data/VCTK-Corpus-Small/', u'--logdir_root', u'gs://wavenet-test-data//gcwavenet10/logs']' returned non-zero exit status 1.

I replaced wavenet_params = json.load(f) by f.close() and I still get the same result.

Everything works when I run it locally with gcloud ml-engine local train.

I think the problem is with reading files with gcloud ml-engine in general or that I can't access the google cloud bucket from within a python file with gs://BUCKET_NAME/FILE_PATH.

Mothball answered 13/3, 2017 at 10:12 Comment(1)
E
17

Python's open function cannot read files from GCS. You will need to use a library capable of doing so. TensorFlow includes one such library:

import tensorflow as tf
from tensorflow.python.lib.io import file_io

with file_io.FileIO(args.wavenet_params, 'r') as f:
  wavenet_params = json.load(f)
Emlynn answered 13/3, 2017 at 14:6 Comment(4)
How can I attach my json with permission credentials to the function in order to access a private bucket?Panoptic
You can set the GOOGLE_APPLICATION_CREDENTIALS env variable. However, doing that in CMLE may be tricky. You can try setting os.env['GOOGLE_APPLICATION_CREDENTIALS'] before importing tensorflow, but I'm not sure that will work. You can also write a wrapper script that sets the environment variable and then runs your real script as a subprocess.Emlynn
I also found tf.contrib.cloud.configure_gcs. tensorflow.org/api_docs/python/tf/contrib/cloud/configure_gcsEmlynn
As well as tensorflow.org/api_docs/python/tf/contrib/cloud/…Emlynn

© 2022 - 2024 — McMap. All rights reserved.