tf object detection api - extract feature vector for each detection bbox
Asked Answered
T

3

8

I'm using Tensorflow object detection API and working on pretrainedd ssd-mobilenet model. is there a way to extact the last global pooling of the mobilenet for each bbox as a feature vector? I can't find the name of the operation holding this info.

I've been able to extract detection labels and bboxes based on the example on github:

 image_tensor = detection_graph.get_tensor_by_name( 'image_tensor:0' )
 # Each box represents a part of the image where a particular object was detected.
 detection_boxes = detection_graph.get_tensor_by_name( 'detection_boxes:0' )
 # Each score represent how level of confidence for each of the objects.
 # Score is shown on the result image, together with the class label.
 detection_scores = detection_graph.get_tensor_by_name( 'detection_scores:0' )
 detection_classes = detection_graph.get_tensor_by_name( 'detection_classes:0' )
 num_detections = detection_graph.get_tensor_by_name( 'num_detections:0' )
 #TODO: add also the feature vector output

 # Actual detection.
 (boxes, scores, classes, num) = sess.run(
                [detection_boxes, detection_scores, detection_classes, num_detections],
                feed_dict={image_tensor: image_np_expanded} )
Tantamount answered 8/3, 2018 at 10:2 Comment(0)
P
5

As Steve said the feature vectors in Faster RCNN in the object-detection api seem to get dropped after the SecondStageBoxPredictor. I was able to thread them through the network by modifying the core/box_predictor.py and meta_architectures/faster_rcnn_meta_arch.py.

The crux of it is that the non-max suppression code actually has a parameter for additional_fields (see core/post_processing.py:176 on master). You can pass a dict of tensors which have the same shape in the first two dimensions as the boxes and scores and the function will return them filtered the same way as the boxes and scores have been. Here's a diff against master of the changes I made:

https://gist.github.com/donniet/c95d19e00ff9abeb786415b3a9348e62

Then instead of loading a frozen graph I had to rebuild the network and load the variables from a checkpoint like this (note: I downloaded the checkpoint for faster rcnn from here: http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_2018_01_28.tar.gz)

import sys
import os
import numpy as np

from object_detection.builders import model_builder
from object_detection.protos import pipeline_pb2

from google.protobuf import text_format
import tensorflow as tf

# load the pipeline structure from the config file
with open('object_detection/samples/configs/faster_rcnn_resnet101_coco.config', 'r') as content_file:
    content = content_file.read()

# build the model with model_builder
pipeline_proto = pipeline_pb2.TrainEvalPipelineConfig()
text_format.Merge(content, pipeline_proto)
model = model_builder.build(pipeline_proto.model, is_training=False)

# construct a network using the model
image_placeholder = tf.placeholder(shape=(None,None,3), dtype=tf.uint8, name='input')
original_image = tf.expand_dims(image_placeholder, 0)
preprocessed_image, true_image_shapes = model.preprocess(tf.to_float(original_image))
prediction_dict = model.predict(preprocessed_image, true_image_shapes)
detections = model.postprocess(prediction_dict, true_image_shapes)

# create an input network to read a file
filename_placeholder = tf.placeholder(name='file_name', dtype=tf.string)
image_file = tf.read_file(filename_placeholder)
image_data = tf.image.decode_image(image_file)

# load the variables from a checkpoint
init_saver = tf.train.Saver()
sess = tf.Session()
init_saver.restore(sess, 'object_detection/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt')

# get the image data
blob = sess.run(image_data, feed_dict={filename_placeholder:'image.jpeg'})
# process the inference
output = sess.run(detections, feed_dict={image_placeholder:blob})

# get the shape of the image_features
print(output['image_features'].shape)

Caveat: I didn't run the tensorflow unit tests against the changes I made, so consider them for demo purposes only, and more testing should be done to make sure they didn't break something else in the object detection api.

Plafond answered 28/6, 2018 at 12:21 Comment(0)
C
4

Support for feature extraction was added in a recent PR: (https://github.com/tensorflow/models/pull/7208). To use this functionality, you can re-export the pretrained models using the exporter tool.

For reference, this was the script I used:

#!/bin/bash
# NOTE: run this from tf/models/research directory

# Ensure that the necessary modules are on the PYTHONPATH
PYTHONPATH=".:./slim:$PYTHONPATH"

# Modify this to ensure that Tensorflow is accessible to your environment
conda activate tf37

# pick a model from the model zoo
ORIG_MODEL="faster_rcnn_inception_resnet_v2_atrous_oid_v4_2018_12_12"

# point at wherever you have downloaded the pretrained model
ORIG_MODEL_DIR="object_detection/pretrained/${ORIG_MODEL}"

# choose a destination where the updated model will be stored
DEST_DIR="${ORIG_MODEL_DIR}_with_feats"
echo "Re-exporting model from $ORIG_MODEL_DIR"

python3 object_detection/export_inference_graph.py \
     --input_type image_tensor \
     --pipeline_config_path "${ORIG_MODEL_DIR}/pipeline.config" \
     --trained_checkpoint_prefix "${ORIG_MODEL_DIR}/model.ckpt" \
     --output_directory "${DEST_DIR}"

To use the re-exported model, you can update the run_inference_for_single_image in the example notebook to include detection_features as an output:

def run_inference_for_single_image(image, graph):
    with graph.as_default():
        with tf.Session() as sess:
            # Get handles to input and output tensors
            ops = tf.get_default_graph().get_operations()
            all_tensor_names = {output.name for op in ops for output in op.outputs}
            tensor_dict = {}
            for key in ['num_detections', 'detection_boxes', 'detection_scores', 'detection_classes',
                        'detection_masks', 'detection_features']:
                tensor_name = key + ':0'
                if tensor_name in all_tensor_names:
                    tensor_dict[key] = tf.get_default_graph().get_tensor_by_name( tensor_name)
            if 'detection_masks' in tensor_dict:
                # The following processing is only for single image
                detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
                detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
                # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
                real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
                detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
                detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
                detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks( detection_masks, detection_boxes, image.shape[1], image.shape[2])
                detection_masks_reframed = tf.cast( tf.greater(detection_masks_reframed, 0.5), tf.uint8)
                # Follow the convention by adding back the batch dimension
                tensor_dict['detection_masks'] = tf.expand_dims( detection_masks_reframed, 0)
            image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

            # Run inference
            output_dict = sess.run(tensor_dict, feed_dict={image_tensor: image})

            # all outputs are float32 numpy arrays, so convert types as appropriate
            output_dict['num_detections'] = int(output_dict['num_detections'][0])
            output_dict['detection_classes'] = output_dict[ 'detection_classes'][0].astype(np.int64)
            output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
            output_dict['detection_scores'] = output_dict['detection_scores'][0]
            output_dict['detection_features'] = output_dict['detection_features'][0]
            if 'detection_masks' in output_dict:
                output_dict['detection_masks'] = output_dict['detection_masks'][0]
    return output_dict
Countershading answered 17/8, 2019 at 13:37 Comment(2)
Although you are correct to some extend, if I try to re-run the exporter, I don't get the detection_features. Checking export_inference_graph.py shows that detection_features is actually there but in the comment it is specified as an optional parameter. Do you know how to enable it?Veriee
You need to set output_final_box_features to true in faster_rcnn.proto. github.com/tensorflow/models/blob/master/research/…Pastelist
C
3

This is admittedly not a perfect answer but I've done a lot of digging into Faster-RCNN with the TF-OD API and made some progress on this problem. I'll explain what I've come to understand from digging into the Faster-RCNN version and hopefully you can translate it to SSD. You're best bet is to dig through the graph on TensorBoard and sift through the tensor names in the detection graph.

First, there isn't always a simple one to one correspondence between the features and the boxes/scores. That is there isn't a simple tensor that you can pull from the network that will provide this, at least not by default.

Here is the code to get the features from a Faster-RCNN network:

https://gist.github.com/markdtw/02ece6b90e75832bd44787c03a664e8d

Though this provides with something that looks like the feature vectors, you can see that there are a few other people that have run into trouble with this solution. The fundamental issue is that the feature vector is pulled before the SecondStagePostprocessor which does several operations before the detection_boxes tensor, and similar tensors, are created.

Before the SecondStagePostprocessor, the class scores and boxes are created and the feature vector is left behind never to be seen from again. In the post-processor, there's a multiclass NMS stage and a sorting stage. The end results is MaxProposalsFromSecondStage whereas the feature vector is populated for [MaxProposalsFromFirstStage, NumberOfFeatureVectors]. So there is a decimation and a sorting operation that makes it difficult to pair final output with the feature vector indices.

My current solution is to pull the feature vector and the boxes from the before the second stage and do the rest by hand. There's undoubtedly a better solution than this but it's hard to follow a graph and to find the proper tensors for the sort operation.

I hope this helps you out! Sorry that I couldn't offer you an end-to-end solution but I hope this gets you over your current road block.

Canonical answered 25/5, 2018 at 18:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.