Espresso ANERuntimeEngine Program Inference overflow

Asked 19/2, 2019 at 18:56 Answered 29/3, 2023 at 10:19

I have two CoreML models. One works fine, and the other generates this error message:

[espresso] [Espresso::ANERuntimeEngine::__forward_segment 0] evaluate[RealTime]WithModel returned 0; code=5 err=Error Domain=com.apple.appleneuralengine Code=5 "processRequest:qos:qIndex:error:: 0x3: Program Inference overflow" UserInfo={NSLocalizedDescription=processRequest:qos:qIndex:error:: 0x3: Program Inference overflow}
[espresso] [Espresso::overflow_error] /var/containers/Bundle/Application/E0DE5E08-D2C6-48AF-91B2-B42BA7877E7E/xxx demoapp.app/mpii-hg128.mlmodelc/model.espresso.net:0

Both models are very similar, (Conv2D models). There are generated with the same scripts and versions of PyTorch, ONNX, and onnx-coreml. The model that works has 1036 layers, and the model that generates the error has 599 layers. They both use standard layers - Conv2D, BatchNorm, ReLU, MaxPool, and Upsample (no custom layers and no Functional or Numpy stuff). They both use relatively the same number of features per layer. They follow essentially the same structure, except the erroring model skips a maxpool layer at the start (hence the higher output resolution).

They both take a 256x256 color image as input, and output 16 channels at (working) 64x64 and (erroring) 128x128 pixels.

The app does not crash, but gives garbage results for the erroring model.

Both models train, evaluate, etc. fine in their native formats (PyTorch).

I have no idea what a Code=5 "processRequest:qos:qIndex:error:: 0x3: Program Inference overflow" error is, and google searches are not yielding anything productive, as I gather "Espresso" and "ANERuntimeEngine" are both private Apple Libraries.

What is this error message telling me? How can I fix it?

Can I avoid this error message by not running the model on the bionic chip but on the CPU/GPU?

Any help is appreciated, thanks.

Pesek answered 19/2, 2019 at 18:56 Comment(0)

That's a LOT of layers!

Espresso is the C++ library that runs the Core ML models. ANERuntimeEngine is used with the Apple Neural Engine chip.

By passing in an MLModelConfiguration with computeUnits set to .cpuAndGPU when you load the Core ML model, you can tell Core ML to not use the Neural Engine.

Olnay answered 19/2, 2019 at 19:9 Comment(5)

Ok, that got rid of the error. (As an aside, the 1036 and 599 are the output names that the ONNX converter gives the layers. i think the actual layer count is much lower) – Pesek 19/2, 2019 at 20:0

Does anybody know what this error actually means, and/or what triggers it? I hit the same error, and Matthijs' solution fixed it, but I'm curious whether there's something else I should be investigating. – Kerakerala 29/8, 2020 at 19:9

@Kerakerala Did you ever get your model to run on ANE without the error? The workaround of moving to CPU/GPU is not an option for me – Maura 13/5, 2021 at 15:43

I honestly don't recall, but I'm about 99% I didn't. I just went with Matthijs' workaround. – Kerakerala 13/5, 2021 at 16:19

I also faced this issue and in my case I wasn't doing the correct preprocessing. Changing the input pixel value range from 0-255 to 0-1 made the error go away. Note that the correct target pixel range will depend on your model. – Mcabee 17/5, 2021 at 20:46

The Apple Neural Engine natively computes in float16 (conforming to IEEE 754).

This means if your model contains weights that are beyond the bounds of what float16 can represent (-65504 <= x <= 65504) or has any intermediate compute results that exceed this range you will see this error.

In my personal experience, I've seen models that threw the error but produced acceptable results, anyways, but also models that output complete garbage when throwing the error.

I'm aware of two (unfortunately not very convenient) methods to avoid this error:

If your inference time performance requirements allow, restrict the model to CPU and GPU only, as @Matthijs has already stated in his answer (both CPU and GPU are able to compute in float32).
Retrain your model in float16 (and make sure that your training framework uses the IEEE 754 implementation of float16 and not something like bfloat).

Vaenfila answered 29/3, 2023 at 10:19 Comment(0)

Recommended topics

Hot tags