I've been trying to adapt the reddit_tft example from the cloud-ml github samples repo to my needs.
I've been able to get it running as per the tutorial readme.
However what i want to use it for is a binary classification problem and also output keys in batch prediction.
So i have made copy of the tutorial code here and have changed it in a few places to be able to have a model type of deep_classifier
that would use a DNNClasifier
instead of a DNNRegressor
.
I've changed the score variable to be
if(score>0,1,0) as score
It's training fine, deploys to cloud ml but i'm not sure how to now get keys back from my predictions. `
I've updated the sql pulling from BigQuery to include id as example_id
here
It seems the code from the tutorial had some sort of placeholder for example_id
so i'm trying to leverage that.
It all seems to work but when i get batch predictions all i get is json like this:
{"classes": ["0", "1"], "scores": [0.20427155494689941, 0.7957285046577454]}
{"classes": ["0", "1"], "scores": [0.14911963045597076, 0.8508803248405457]}
...
So example_id
does not seem to be making it into the serving functions like i need.
I've tried to follow the approach here which is based on adapting the census example for keys.
I just cant figure out how to finish adapting this reddit example to also output keys in the predictions as they look a bit different to me in terms of design and functions being used.
Update 1
My latest attempt is here Trying to use the approach outlined here.
However this is giving errors:
NotFoundError (see above for traceback): /tmp/tmp2jllvb/model.ckpt-1_temp_9530d2c5823d4462be53fa5415e429fd; No such file or directory
[[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:ps/replica:0/task:0/device:CPU:0"](save/ShardedFilename, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, dnn/hiddenlayer_0/kernel/part_2/read, dnn/dnn/hiddenlayer_0/kernel/part_2/Adagrad/read, dnn/hiddenlayer_1/kernel/part_2/read, dnn/dnn/hiddenlayer_1/kernel/part_2/Adagrad/read, dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/read, dnn/dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/Adagrad/read, dnn/logits/bias/part_0/read, dnn/dnn/logits/bias/part_0/Adagrad/read, global_step)]]
Update 2
My latest attempt and details are here.
I'm now getting a error from tensorflow-fransform (run_preprocess.sh works fine in tft 0.1)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 282, in __setstate__
self._dtype = tf.as_dtype(state['dtype'])
TypeError: string indices must be integers, not str
Update 3
I have changed things to just use beam + csv and avoid tft. Also i'm now using the approach as outlined here for extending the canned estimator to get the key back with the predictions.
However when following this post to try get the comments in as features i'm now running into a new error.
The replica worker 3 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): [...] File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/estimator/python/estimator/extenders.py", line 87, in new_model_fn spec = estimator.model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 203, in public_model_fn return self._call_model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 694, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 520, in _model_fn config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 158, in _dnn_linear_combined_model_fn dnn_logits = dnn_logit_fn(features=features, mode=mode) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn.py", line 89, in dnn_logit_fn features=features, feature_columns=feature_columns) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/feature_column/feature_column.py", line 226, in input_layer with variable_scope.variable_scope(None, default_name=column.name): File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1826, in __enter__ current_name_scope_name = self._current_name_scope.__enter__() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 4932, in __enter__ return self._name_scope.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3514, in name_scope raise ValueError("'%s' is not a valid scope name" % name) ValueError: 'Tensor("Slice:0", shape=(?, 20), dtype=int64)_embedding' is not a valid scope name
My repo for this attempt/approach is here. This all runs fine if i just use subreddit
as a feature, it's adding in the comment
feature that seems to be causing the problems. Lines 103 to 111 is where i have followed this approach.
Not sure what's triggering the error in my code from reading the trace. Anyone any ideas?
Or can anyone point me towards another approach to go from text to bow to embedding feature in TF?