Converting google-cloud-ml github Reddit example from regression to classification and adding keys?
Asked Answered
V

2

6

I've been trying to adapt the reddit_tft example from the cloud-ml github samples repo to my needs.

I've been able to get it running as per the tutorial readme.

However what i want to use it for is a binary classification problem and also output keys in batch prediction.

So i have made copy of the tutorial code here and have changed it in a few places to be able to have a model type of deep_classifier that would use a DNNClasifier instead of a DNNRegressor.

I've changed the score variable to be

if(score>0,1,0) as score

It's training fine, deploys to cloud ml but i'm not sure how to now get keys back from my predictions. `

I've updated the sql pulling from BigQuery to include id as example_id here

It seems the code from the tutorial had some sort of placeholder for example_id so i'm trying to leverage that.

It all seems to work but when i get batch predictions all i get is json like this:

{"classes": ["0", "1"], "scores": [0.20427155494689941, 0.7957285046577454]} {"classes": ["0", "1"], "scores": [0.14911963045597076, 0.8508803248405457]} ...

So example_id does not seem to be making it into the serving functions like i need.

I've tried to follow the approach here which is based on adapting the census example for keys.

I just cant figure out how to finish adapting this reddit example to also output keys in the predictions as they look a bit different to me in terms of design and functions being used.

Update 1

My latest attempt is here Trying to use the approach outlined here.

However this is giving errors:

NotFoundError (see above for traceback): /tmp/tmp2jllvb/model.ckpt-1_temp_9530d2c5823d4462be53fa5415e429fd; No such file or directory
     [[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:ps/replica:0/task:0/device:CPU:0"](save/ShardedFilename, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, dnn/hiddenlayer_0/kernel/part_2/read, dnn/dnn/hiddenlayer_0/kernel/part_2/Adagrad/read, dnn/hiddenlayer_1/kernel/part_2/read, dnn/dnn/hiddenlayer_1/kernel/part_2/Adagrad/read, dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/read, dnn/dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/Adagrad/read, dnn/logits/bias/part_0/read, dnn/dnn/logits/bias/part_0/Adagrad/read, global_step)]]

Update 2

My latest attempt and details are here.

I'm now getting a error from tensorflow-fransform (run_preprocess.sh works fine in tft 0.1)

File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 282, in __setstate__ self._dtype = tf.as_dtype(state['dtype']) TypeError: string indices must be integers, not str

Update 3

I have changed things to just use beam + csv and avoid tft. Also i'm now using the approach as outlined here for extending the canned estimator to get the key back with the predictions.

However when following this post to try get the comments in as features i'm now running into a new error.

The replica worker 3 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): [...] File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/estimator/python/estimator/extenders.py", line 87, in new_model_fn spec = estimator.model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 203, in public_model_fn return self._call_model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 694, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 520, in _model_fn config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 158, in _dnn_linear_combined_model_fn dnn_logits = dnn_logit_fn(features=features, mode=mode) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn.py", line 89, in dnn_logit_fn features=features, feature_columns=feature_columns) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/feature_column/feature_column.py", line 226, in input_layer with variable_scope.variable_scope(None, default_name=column.name): File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1826, in __enter__ current_name_scope_name = self._current_name_scope.__enter__() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 4932, in __enter__ return self._name_scope.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3514, in name_scope raise ValueError("'%s' is not a valid scope name" % name) ValueError: 'Tensor("Slice:0", shape=(?, 20), dtype=int64)_embedding' is not a valid scope name

My repo for this attempt/approach is here. This all runs fine if i just use subreddit as a feature, it's adding in the comment feature that seems to be causing the problems. Lines 103 to 111 is where i have followed this approach.

Not sure what's triggering the error in my code from reading the trace. Anyone any ideas?

Or can anyone point me towards another approach to go from text to bow to embedding feature in TF?

Voile answered 31/1, 2018 at 13:25 Comment(0)
A
3

See:

https://medium.com/@lakshmanok/how-to-extend-a-canned-tensorflow-estimator-to-add-more-evaluation-metrics-and-to-pass-through-ddf66cd3047d

Here's what the code looks like to pass through keys:

def forward_key_to_export(estimator):
    estimator = tf.contrib.estimator.forward_features(estimator, KEY_COLUMN)

    ## This shouldn't be necessary (I've filed CL/187793590 to update extenders.py with this code)
    config = estimator.config
    def model_fn2(features, labels, mode):
      estimatorSpec = estimator._call_model_fn(features, labels, mode, config=config)
      if estimatorSpec.export_outputs:
        for ekey in ['predict', 'serving_default']:
          estimatorSpec.export_outputs[ekey] = \
            tf.estimator.export.PredictOutput(estimatorSpec.predictions)
      return estimatorSpec
    return tf.estimator.Estimator(model_fn=model_fn2, config=config)
    ##

# Create estimator to train and evaluate
def train_and_evaluate(output_dir):
    estimator = tf.estimator.DNNLinearCombinedRegressor(...)
    estimator = forward_key_to_export(estimator)
    ...
    tf.estimator.train_and_evaluate(estimator, ...)
Alexaalexander answered 4/3, 2018 at 20:42 Comment(2)
Great - am following your medium post and giving it a go now - will let you know how it goes. Cheers.Voile
I'm getting: ValueError: Batch length of predictions should be same. KEY_COLUMN has different batch length then others. when batch_size > 1. Any idea why ?Wimberly
M
0

We have plans, but haven't moved the changes into Census yet for the output keys. In the mean time can you please see if this gist helps https://gist.github.com/andrewm4894/ebd3ac3c87e2ab4af8a10740e85073bb#file-with_keys_model-py

Please feel free to send a PR if you get to it sooner and we will merge your contribution.

Merocrine answered 31/1, 2018 at 22:1 Comment(2)
That gist is actually mine, but I couldn't figure it out for the Reddit example and tf 1.4. I was also trying the forward_features approach but was getting "DNNClassifier has no module named model_fn"Voile
I tried adding forward_features approach here github.com/andrewm4894/my-google-cloudml-tensorflow-examples/… However i get AttributeError: 'DNNClassifier' object has no attribute 'model_fn'Voile

© 2022 - 2024 — McMap. All rights reserved.