How to publish custom (non-tensorflow) models using tensorflow-serving?

H

2

7

I've read the basic and advanced tensorflow-serving tutorials but I am still unclear for how to build support in tensorflow-serving for the following:

models built in Python (like xgboost or scikit-learn)
models built in R (like xgboost or lightgbm)

Considered using TFBT available in tf.contrib, but according to this, TensorFlow Boosted Trees (TFBT) takes much longer to train compared to xgboost and observed that it had worse precision.

Any help or suggestions would be appreciated...

Hourly answered 30/3, 2018 at 9:15 Comment(2)

Are u able to add support for xgboost models in tensorflow serving ? If yes, please help me. – Adonis 6/8, 2019 at 10:52

I have implemented XGBoost Serving that is a fork of TensorFlow Serving. It supports serving XGBoost models and XGBoost && FM models. You can read the README for more details and try it in a few minutes. If you encounter any problems, please submit an issue or email me directly. – Surcease 28/6, 2021 at 8:7

U

5

Tensorflow documentation mentions that :

Any C++ class can be a servable, e.g. int, std::map<string, int> or any class defined in your binary -- let us call it YourServable.

Tensorflow serving paper also mentions that

"It is extremely flexible in terms of the types of ML platforms it supports"

After some reading, I have found that in practice serving a custom (non-ternsorflow) model is quite involved. There is an adoption complexity for the benefit of flexibility in tensorflow serving libraries. This is not a diss on Google tensorflow serving at all, nor a negative comment on their documentation. I was briefly investigating what it would take to host another model platform and I would like to share my findings and get some feedback from the community. I am by no means an expert in different model platforms nor in tensorflow serving. I have not tried to implement any of these in code. I am certain one would find mistakes in my explanation once you actually dive deep in the implementation.

There are many model platforms that one may want to use. XGBoost, LightGBM, SkLearn, pytorch….. In this document I would visit only XGBoost. Bunch of similar questions need to be discussed for other model platforms as well.

Loading

A model needs to live in some file in some path and it needs to be loaded to the tensorflow/serving runtime.

The docs mention how to create your own servable. There is an example of the hash table loader from the code. I guess you need to write something like that for XGBoost. XGBoost has a c++ api and there are some examples in xgboost load model in c++ (python -> c++ prediction scores mismatch). So at least that is possible in theory. However you need to write new code for that. You need to load the XGBoost library for that. You either need to bring the XGBoost at compile time, or dlopen its library at runtime. You at least need to fork the tensorflow/serving code and maintain it yourself. That by itself may mean maintaining your fork essentially indefinitely.

I think the SimpleLoaderSourceAdapter may be enough as a starter however servables/tensorflow had to create its own here . So you may need to write your own loaders and source adapters for your new model.

Making the ServerCore load your model

Having a model loadable is not enough. Your model should also be dynamically or statically loaded by tensorflow/serving runtime. There are various ways to get your model bytes into tensorflow/serving. A simple approach is to have the model in the file system already in a regular file and get your model loaded statically through a ModelConfig. At initialization time ServerCode iterates over these ModelConfigList entries and reads loads those models.

The ModelConfig object has a model_platform field and as of this writing only tensorflow is supported in the open source version. So you need to add a new model_platform say XGBoost and change the proto files for ModelConfig accordingly.

Tensorflow serving's "Create a new Servable" documentation has example code that calls the ConnectSourceToTarget function directly. However I am not sure where is the best place to write this code in your application or how much preferable it would be to try to use existing the static config loading functionality in tensorflow serving as described before.

Predicting

We have talked about some of the setup to get your model loaded in the tensorflow/serving runtime. I am sure there are bunch of other things I have missed, but I do not think the story is done there.

How would you use your model for prediction ?

I am completely glossing over the gRPC server. I am sure there are many more setup necessary for you to do there. Hoping the HTTP path would be simpler, tensorflow serving has this HttpRestApiHandler that uses a TensorflowPredictor object to call predict upon.

It is reasonable that one should expect to write a XGBoostPredictor class while adding the XBoost model platform for the first time. This would contain the XGBoost specific predict function. This is not too different compared to the need to write a custom loader to read an XGBoost model from a file.

I guess you also need to somehow extend the HttpRestApiHandler to call your XGBoostPredictor when the model is an XBoost model. And also somehow add the ability to differentiate between TensorFlowPredictor or XBoostPreditor. An obvious way to do that is not clear to me. I would be very interested to learn better appraoches. Tensorflow serving also has Life of a TensorFlow Serving inference request documentation that could be helpful.

In this discussion we have not talked about integrating into debuggability or batch processing features of tensorflow serving. Surely those also require in-depth understanding and additional work to integrate with a non-tensorflow model.

Conclusion

I think it would be immensely valuable if anyone has an open source example serving a non-tensorflow model via tensorflow/serving. I agree with the claim in their paper that tensorflow serving is extremely flexible. The non-tensorflow specific base classes for loading, version management, batching are pretty general and flexible. However, with that extreme flexibility also comes a cost of complexity to adopt a new ML platform.

As a starting point one needs to carefully understand the example of serveables/tensorflow and expect a similar amount of complexity to host another model platform.

Implementation complexity aside, I would be extremely cautious about maintaining the new software you are going to write. It is prudent to expect to own your fork with all the libraries indefinitely in your organization, or engage with the upstream community to extend tensorflow serving. There are some prior Issues in upstream already: 1694, 768, 637.

Google ML platform has the ability to serve SKLearn and XGBoost models in addition to TensorFlow models. Their paper also says:

"In seriousness, Google uses TensorFlow- Serving for some proprietary non-TensorFlow machine learning frameworks as well as TensorFlow. "

So similar extensions may have already been implemented on top of tensorflow serving. On the other hand, the paper was written in 2017 and who knows what else has changed since.

Urfa answered 21/7, 2020 at 18:16 Comment(0)

S

0

I know that your question is about tensorflow serving but just in case you you never heard about it, there is also this solution called simple-tensorflow-serving. They say on their site that:

Simple TensorFlow Serving is the generic and easy-to-use serving service for machine learning models.

[x] Support distributed TensorFlow models
[x] Support the general RESTful/HTTP APIs
[x] Support inference with accelerated GPU
[x] Support curl and other command-line tools
[x] Support clients in any programming language
[x] Support code-gen client by models without coding
[x] Support inference with raw file for image models
[x] Support statistical metrics for verbose requests
[x] Support serving multiple models at the same time
[x] Support dynamic online and offline for model versions
[x] Support loading new custom op for TensorFlow models
[x] Support secure authentication with configurable basic auth
[x] Support multiple models of TensorFlow/MXNet/PyTorch/Caffe2/CNTK/ONNX/H2o/Scikit-learn/XGBoost/PMML

I didn't have the chance to test it yet (I'm using Tensorflow Serving right now) but I'll probably give it a try soon because I want to serve a XGBoost model.

Spanish answered 14/1, 2021 at 18:49 Comment(0)

Loading

Making the ServerCore load your model

Predicting

Conclusion

Recommended topics

Hot tags