To help clarify some of the questions you have, allow me to start with the basic anatomy of a prediction request:
{"instances": [<instance>, <instance>, ...]}
Where instance
is a JSON object (dict/map, I'll use the Python term "dict" hereafter) and the attributes/keys are the names of the inputs with values containing the data for that input.
What the cloud service does (and gcloud ml-engine local predict
uses the same underlying libraries as the service) is it takes the list of dicts (which can be thought of as rows of data) and then converts it to a dict of lists (which can be thought of as columnar data containing batches of instances) with the same keys as in the original data. For example,
{"instances": [{"x": 1, "y": "a"}, {"x": 3, "y": "b"}, {"x": 5, "y": "c"}]}
becomes (internally)
{"x": [1, 3, 5], "y": ["a", "b", "c"]}
The keys in this dict (and hence, in the instance in the original request) must correspond to the keys of the dict passed to the ServingInputFnReceiver
. It should be apparent from this example that the service "batches" all of the data, meaning all of the instances are fed into the graph as a single batch. That's why the outer dimension of the shape of the inputs must be None
-- it is the batch dimension and it is not known before a request is made (since each request may have different number of instances). When exporting a graph to accept the above requests, you might define a function like this:
def serving_input_fn():
inputs = {'x': tf.placeholder(dtype=tf.int32, shape=[None]),
'y': tf.placeholder(dtype=tf.string, shape=[None]}
return tf.estimator.export.ServingInputReceiver(inputs, inputs)
Since JSON does not (directly) support binary data and since TensorFlow has no way of distinguishing "strings" from "bytes", we need to treat binary data specially. First of all, we need the name of said inputs to end in "_bytes" to help differentiate a text string from a byte string. Using the example above, suppose y
contained binary data instead of text. We would declare the following:
def serving_input_fn():
inputs = {'x': tf.placeholder(dtype=tf.int32, shape=[None]),
'y_bytes': tf.placeholder(dtype=tf.string, shape=[None]}
return tf.estimator.export.ServingInputReceiver(inputs, inputs)
Notice that the only thing that changed was using y_bytes
instead of y
as the name of the input.
Next, we need to actually base64 encode the data; anywhere where a string would be acceptable, we can instead use an object like so: {"b64": ""}. Adapting the running example, a request might look like:
{
"instances": [
{"x": 1, "y_bytes": {"b64": "YQ=="}},
{"x": 3, "y_bytes": {"b64": "Yg=="}},
{"x": 5, "y_bytes": {"b64": "Yw=="}}
]
}
In this case the service does exactly what it did before, but adding one step: it automatically base64 decodes the string (and "replaces" the {"b64": ...} object with the bytes) before sending to TensorFlow. So TensorFlow actually ends up with a dict like exactly as before:
{"x": [1, 3, 5], "y_bytes": ["a", "b", "c"]}
(Note that the name of the input has not changed.)
Of course, base64 textual data is kind of pointless; you'd usually do this, e.g., for image data which can't be sent any other way over JSON, but I hope the above example is sufficient to illustrate the point anyways.
There's another important point to be made: the service supports a type of shorthand. When there is exactly one input to your TensorFlow model, there's no need to incessantly repeat the name of the that input in every single object in your list of instances. To illustrate, imagine exporting a model with only x
:
def serving_input_fn():
inputs = {'x': tf.placeholder(dtype=tf.int32, shape=[None])}
return tf.estimator.export.ServingInputReceiver(inputs, inputs)
The "long form" request would look like this:
{"instances": [{"x": 1}, {"x": 3}, {"x": 5}]}
Instead, you can send a request in shorthand, like so:
{"instances": [1, 3, 5]}
Note that this applies even for base64 encoded data. So, for instance, if instead of only exporting x
, we had only exported y_bytes
, we could simplify the requests from:
{
"instances": [
{"y_bytes": {"b64": "YQ=="}},
{"y_bytes": {"b64": "Yg=="}},
{"y_bytes": {"b64": "Yw=="}}
]
}
To:
{
"instances": [
{"b64": "YQ=="},
{"b64": "Yg=="},
{"b64": "Yw=="}
]
}
In many cases this is only a small win, but it definitely aids readability, e.g., when the inputs contain CSV data.
So putting it altogether to adapt to your specific scenario, here's what your serving function should look like:
def serving_input_fn():
feature_placeholders = {
'image_bytes': tf.placeholder(dtype=tf.string, shape=[None], name='source')}
single_image = tf.decode_raw(feature_placeholders['image_bytes'], tf.float32)
return tf.estimator.export.ServingInputReceiver(feature_placeholders, feature_placeholders)
Notable differences from your current code:
- Name of the input is not
b64
, but image_bytes
(could be anything that ends in _bytes
)
feature_placeholders
is used as both arguments to
ServingInputReceiver
And a sample request might look like this:
{
"instances": [
{"image_bytes": {"b64": "YQ=="}},
{"image_bytes": {"b64": "Yg=="}},
{"image_bytes": {"b64": "Yw=="}}
]
}
Or, optionally, in short hand:
{
"instances": [
{"b64": "YQ=="},
{"b64": "Yg=="},
{"b64": "Yw=="}
]
}
One last final note. gcloud ml-engine local predict
and gcloud ml-engine predict
construct the request based on the contents of the file passed in. It is very important to note that the content of the file is currently not a full, valid request, but rather each line of the --json-instances
file becomes one entry in the list of instances. Specifically in your case, the file will look like (newlines are meaningful here):
{"image_bytes": {"b64": "YQ=="}}
{"image_bytes": {"b64": "Yg=="}}
{"image_bytes": {"b64": "Yw=="}}
or the equivalent shorthand. gcloud
will take each line and construct the actual request shown above.