What does google cloud ml-engine do when a Json request contains "_bytes" or "b64"?

The google cloud documentation (see Binary data in prediction input) states:

Your encoded string must be formatted as a JSON object with a single key named b64. The following Python example encodes a buffer of raw JPEG data using the base64 library to make an instance:
{"image_bytes":{"b64": base64.b64encode(jpeg_data)}}
In your TensorFlow model code, you must name the aliases for your input and output tensors so that they end with '_bytes'.

I would like to understand more about how this process works on the google cloud side.

Is the ml-engine automatically decoding any content after the "b64" string to byte data?
When the request has this nested structure, does it only pass in the "b64" section to the serving input function and remove the "image_bytes" key?
Is each request passed individually to the serving input function or are they batched?
Do we define the input output aliases in the ServingInputReceiver returned by the serving input function?

I have found no way to create a serving input function which uses this nested structure to define the feature placeholders. I only use "b64" in mine and I am not sure what the gcloud ml-engine does on receiving the requests.

Additionally when predicting locally using gcloud ml-engine local predict, sending the request with the nested structure fails, (unexpected key image_bytes as it is not defined in the serving input function). But when predicting using gcloud ml-engine predict, sending requests with the nested structure works even when the serving input function contains no reference to "image_bytes". The gcloud predict also works when leaving out "image_bytes" and passing in just "b64".

An example serving input function

def serving_input_fn():
    feature_placeholders = {'b64': tf.placeholder(dtype=tf.string,
                                                  shape=[None],
                                                  name='source')}
    single_image = tf.decode_raw(feature_placeholders['b64'], tf.float32)
    inputs = {'image': single_image}
    return tf.estimator.export.ServingInputReceiver(inputs, feature_placeholders)

I gave the example using images but I assume the same should apply to all types of data sent as bytes and base64 encoded.

There are a lot of stackoverflow questions which contain references to the need to include "_bytes" with snippets of information, but I would find it useful if someone could explain a bit more in detail whats going on as then I wouldn't be so hit and miss when formatting requests.

Stackoverflow questions on this topic

To help clarify some of the questions you have, allow me to start with the basic anatomy of a prediction request:

{"instances": [<instance>, <instance>, ...]}

Where instance is a JSON object (dict/map, I'll use the Python term "dict" hereafter) and the attributes/keys are the names of the inputs with values containing the data for that input.

What the cloud service does (and gcloud ml-engine local predict uses the same underlying libraries as the service) is it takes the list of dicts (which can be thought of as rows of data) and then converts it to a dict of lists (which can be thought of as columnar data containing batches of instances) with the same keys as in the original data. For example,

{"instances": [{"x": 1, "y": "a"}, {"x": 3, "y": "b"}, {"x": 5, "y": "c"}]}

becomes (internally)

{"x": [1, 3, 5], "y": ["a", "b", "c"]}

The keys in this dict (and hence, in the instance in the original request) must correspond to the keys of the dict passed to the ServingInputFnReceiver. It should be apparent from this example that the service "batches" all of the data, meaning all of the instances are fed into the graph as a single batch. That's why the outer dimension of the shape of the inputs must be None -- it is the batch dimension and it is not known before a request is made (since each request may have different number of instances). When exporting a graph to accept the above requests, you might define a function like this:

def serving_input_fn():
  inputs = {'x': tf.placeholder(dtype=tf.int32, shape=[None]),
            'y': tf.placeholder(dtype=tf.string, shape=[None]}
  return tf.estimator.export.ServingInputReceiver(inputs, inputs)

Since JSON does not (directly) support binary data and since TensorFlow has no way of distinguishing "strings" from "bytes", we need to treat binary data specially. First of all, we need the name of said inputs to end in "_bytes" to help differentiate a text string from a byte string. Using the example above, suppose y contained binary data instead of text. We would declare the following:

def serving_input_fn():
  inputs = {'x': tf.placeholder(dtype=tf.int32, shape=[None]),
            'y_bytes': tf.placeholder(dtype=tf.string, shape=[None]}
  return tf.estimator.export.ServingInputReceiver(inputs, inputs)

Notice that the only thing that changed was using y_bytes instead of y as the name of the input.

Next, we need to actually base64 encode the data; anywhere where a string would be acceptable, we can instead use an object like so: {"b64": ""}. Adapting the running example, a request might look like:

{
  "instances": [
    {"x": 1, "y_bytes": {"b64": "YQ=="}},
    {"x": 3, "y_bytes": {"b64": "Yg=="}},
    {"x": 5, "y_bytes": {"b64": "Yw=="}}
  ]
}

In this case the service does exactly what it did before, but adding one step: it automatically base64 decodes the string (and "replaces" the {"b64": ...} object with the bytes) before sending to TensorFlow. So TensorFlow actually ends up with a dict like exactly as before:

{"x": [1, 3, 5], "y_bytes": ["a", "b", "c"]}

(Note that the name of the input has not changed.)

Of course, base64 textual data is kind of pointless; you'd usually do this, e.g., for image data which can't be sent any other way over JSON, but I hope the above example is sufficient to illustrate the point anyways.

There's another important point to be made: the service supports a type of shorthand. When there is exactly one input to your TensorFlow model, there's no need to incessantly repeat the name of the that input in every single object in your list of instances. To illustrate, imagine exporting a model with only x:

def serving_input_fn():
  inputs = {'x': tf.placeholder(dtype=tf.int32, shape=[None])}
  return tf.estimator.export.ServingInputReceiver(inputs, inputs)

The "long form" request would look like this:

{"instances": [{"x": 1}, {"x": 3}, {"x": 5}]}

Instead, you can send a request in shorthand, like so:

{"instances": [1, 3, 5]}

Note that this applies even for base64 encoded data. So, for instance, if instead of only exporting x, we had only exported y_bytes, we could simplify the requests from:

{
  "instances": [
    {"y_bytes": {"b64": "YQ=="}},
    {"y_bytes": {"b64": "Yg=="}},
    {"y_bytes": {"b64": "Yw=="}}
  ]
}

To:

{
  "instances": [
    {"b64": "YQ=="},
    {"b64": "Yg=="},
    {"b64": "Yw=="}
  ]
}

In many cases this is only a small win, but it definitely aids readability, e.g., when the inputs contain CSV data.

So putting it altogether to adapt to your specific scenario, here's what your serving function should look like:

def serving_input_fn():
  feature_placeholders = {
    'image_bytes': tf.placeholder(dtype=tf.string, shape=[None], name='source')}
    single_image = tf.decode_raw(feature_placeholders['image_bytes'], tf.float32)
    return tf.estimator.export.ServingInputReceiver(feature_placeholders, feature_placeholders)

Notable differences from your current code:

Name of the input is not b64, but image_bytes (could be anything that ends in _bytes)
feature_placeholders is used as both arguments to ServingInputReceiver

And a sample request might look like this:

{
  "instances": [
    {"image_bytes": {"b64": "YQ=="}},
    {"image_bytes": {"b64": "Yg=="}},
    {"image_bytes": {"b64": "Yw=="}}
  ]
}

Or, optionally, in short hand:

{
  "instances": [
    {"b64": "YQ=="},
    {"b64": "Yg=="},
    {"b64": "Yw=="}
  ]
}

One last final note. gcloud ml-engine local predict and gcloud ml-engine predict construct the request based on the contents of the file passed in. It is very important to note that the content of the file is currently not a full, valid request, but rather each line of the --json-instances file becomes one entry in the list of instances. Specifically in your case, the file will look like (newlines are meaningful here):

{"image_bytes": {"b64": "YQ=="}}
{"image_bytes": {"b64": "Yg=="}}
{"image_bytes": {"b64": "Yw=="}}

or the equivalent shorthand. gcloud will take each line and construct the actual request shown above.

Recommended topics

Hot tags