Speeding up inference of Keras models

Asked 16/10, 2017 at 1:51 Answered 28/7, 2022 at 12:13

Solved machine-learning tensorflow raspberry-pi computer-vision keras

I have a Keras model which is doing inference on a Raspberry Pi (with a camera). The Raspberry Pi has a really slow CPU (1.2.GHz) and no CUDA GPU so the model.predict() stage is taking a long time (~20 seconds). I'm looking for ways to reduce that by as much as possible. I've tried:

Overclocking the CPU (+ 200 MhZ) and got a few extra seconds of performance.
Using float16's instead of float32's.
Reducing the image input size as much as possible.

Is there anything else I can do to increase the speed during inference? Is there a way to simplify a model.h5 and take a drop in accuracy? I've had success with simpler models, but for this project I need to rely on an existing model so I can't train from scratch.

Engage answered 16/10, 2017 at 1:51 Comment(3)

How's your model architecture? – Chastain 16/10, 2017 at 11:25

@FábioPerez very complex. VGG16 then a dual structure where both paths are 30 layers+ which are then concatenated at the end. It’s pretrained so afaik I can’t adjust the model structure. – Engage 16/10, 2017 at 15:19

The inference of VGG is slow due to the large fully-connected layer at the end. Use a faster net such as MobileNet. – Chastain 16/10, 2017 at 15:48

VGG16 / VGG19 architecture is very slow since it has lots of parameters. Check this answer.

Before any other optimization, try to use a simpler network architecture.

Google's MobileNet seems like a good candidate since it's implemented on Keras and it was designed for more limited devices.

If you can't use a different network, you may compress the network with pruning. This blog post specifically do pruning with Keras.

Ackerley answered 16/10, 2017 at 15:54 Comment(6)

It’s a pretrained network so I don’t have that option right? – Engage 16/10, 2017 at 16:24

Can't you reproduce the train on MobileNet? The weights are initialized with ImageNet weights. Otherwise, you will have to compress your network by finding the most impactful units. Check arxiv.org/abs/1510.00149 and jacobgil.github.io/deeplearning/pruning-deep-learning – Chastain 16/10, 2017 at 16:51

possible but it would take weeks of training and be pretty expensive. I’ll look into compression - thanks. – Engage 16/10, 2017 at 16:58

Also check github.com/Irtza/Keras_model_compression. I never tested this, but might be what you want. – Chastain 16/10, 2017 at 17:14

great thanks. Accepted your answer - would be great if you could mention compression in the answer too. – Engage 16/10, 2017 at 17:19

Also, this might be of your interest. – Chastain 16/10, 2017 at 17:21

Maybe OpenVINO will help. OpenVINO is an open-source toolkit for network inference, and it optimizes the inference performance by, e.g., graph pruning and fusing some operations. The ARM support is provided by the contrib repository.

Here are the instructions on how to build an ARM plugin to run OpenVINO on Raspberry Pi.

Disclaimer: I work on OpenVINO.

Holoenzyme answered 28/7, 2022 at 12:13 Comment(0)

Recommended topics

Hot tags