Submitting Google Cloud ML Engine Jobs from Python Directly
Asked Answered
E

1

6

I have a Keras .h5 model which I've been training locally, however now wish to automate the full process via the Google Cloud ML-Engine.

I have all the GCloud Storage buckets set up to be accessed from the application, and I have read about configuring the jobs to submit a Keras model to train in GCloud ML-Engine. However, all of those tutorials (including the docs on the google cloud ml-engine) state to run the job it's best to run gcloud ml-engine jobs submit training from the command line.

I am aware of a Python Client Library for Google Cloud however, but the docs on that seem a bit opaque.

Does anybody know if I could submit the training of the model fully from a python file itself (via either a direct API call or through the Google Client Library)? I am asking as I am hoping to make this into a fully automated, hosted Flask web-application for model training so it needs to be as hands off as possible.

Episiotomy answered 9/1, 2018 at 13:20 Comment(0)
P
8

There is indeed a way of submitting jobs to Cloud ML Engine from a Python script. You can use the Google Python API Client Library for that purpose, and in the link I shared you can have a look at a close explanation of how the API calls. There's a command-by-command explanation, and at the end an example of how to put everything together. In order to work with the library, you will have to install it first, as explained in this other page.

Then, the method you are interested in (for submitting jobs) is cloudml.projects.jobs.create(), and you can find detailed information on how to call it in the developers page. I think you might be interested in playing around with the REST API first, in order to get familiar with how it works; you can do so through the APIs Explorer. Below there's an example of a body used to make the API call:

training_inputs = {'scaleTier': 'CUSTOM',
    'masterType': 'complex_model_m',
    'workerType': 'complex_model_m',
    'parameterServerType': 'large_model',
    'workerCount': 9,
    'parameterServerCount': 3,
    'packageUris': ['gs://<YOUR_TRAINER_PATH>/package-0.0.0.tar.gz'],
    'pythonModule': 'trainer.task',
    'args': ['--arg1', 'value1', '--arg2', 'value2'],
    'region': '<REGION>',
    'jobDir': 'gs://<YOUR_TRAINING_PATH>',
    'runtimeVersion': '1.4'}

job_spec = {'jobId': my_job_name, 'trainingInput': training_inputs}

You should, adapt it to the specifications of your model. Once it is ready, you can have a look at this page explaining how to submit a training job using Python, but in short, it should be something like this:

from oauth2client.client import GoogleCredentials
from googleapiclient import discovery
from googleapiclient import errors

project_name = 'my_project_name'
project_id = 'projects/{}'.format(project_name)

credentials = GoogleCredentials.get_application_default()

cloudml = discovery.build('ml', 'v1', credentials=credentials)

request = cloudml.projects().jobs().create(body=job_spec, parent=project_id)

try:
    response = request.execute()
    # Handle a successful request

except errors.HttpError, err:
    logging.error('There was an error creating the training job.'
                  ' Check the details:')
    logging.error(err._get_reason())

You should be able to run this code in order to submit a Cloud ML Engine job through a Python script.

I hope this could be of help and lighten a bit the opacity of the documentation you mentioned.

Prying answered 10/1, 2018 at 14:39 Comment(1)
Small comment that you're doing request.execute() twice. The second one will always fail because after executing the first a job with my_job_name will already exist.Unison

© 2022 - 2024 — McMap. All rights reserved.