Tensorflow documentation mentions that :
Any C++ class can be a servable, e.g. int
, std::map<string, int>
or any class defined in your binary -- let us call it YourServable
.
Tensorflow serving paper also mentions that
"It is extremely flexible in terms of the types of ML platforms it
supports"
After some reading, I have found that in practice serving a custom (non-ternsorflow) model is quite involved. There is an adoption complexity for the benefit of flexibility in tensorflow serving libraries. This is not a diss on Google tensorflow serving at all, nor a negative comment on their documentation. I was briefly investigating what it would take to host another model platform and I would like to share my findings and get some feedback from the community. I am by no means an expert in different model platforms nor in tensorflow serving. I have not tried to implement any of these in code. I am certain one would find mistakes in my explanation once you actually dive deep in the implementation.
There are many model platforms that one may want to use. XGBoost, LightGBM, SkLearn, pytorch….. In this document I would visit only XGBoost. Bunch of similar questions need to be discussed for other model platforms as well.
Loading
A model needs to live in some file in some path and it needs to be loaded to the tensorflow/serving runtime.
The docs mention how to create your own servable. There is an example of the hash table loader from the code.
I guess you need to write something like that for XGBoost. XGBoost has a c++ api and there are some examples in xgboost load model in c++ (python -> c++ prediction scores mismatch).
So at least that is possible in theory.
However you need to write new code for that. You need to load the XGBoost library for that. You either need to bring the XGBoost at compile time, or dlopen its library at runtime.
You at least need to fork the tensorflow/serving code and maintain it yourself. That by itself may mean maintaining your fork essentially indefinitely.
I think the SimpleLoaderSourceAdapter may be enough as a starter however servables/tensorflow had to create its own here .
So you may need to write your own loaders and source adapters for your new model.
Making the ServerCore load your model
Having a model loadable is not enough. Your model should also be dynamically or statically loaded by tensorflow/serving runtime. There are various ways to get your model bytes into tensorflow/serving. A simple approach is to have the model in the file system already in a regular file and get your model loaded statically through a ModelConfig. At initialization time ServerCode iterates over these ModelConfigList entries and reads loads those models.
The ModelConfig object has a model_platform field and as of this writing only tensorflow is supported in the open source version. So you need to add a new model_platform say XGBoost and change the proto files for ModelConfig accordingly.
Tensorflow serving's "Create a new Servable" documentation has example code that calls the ConnectSourceToTarget
function directly. However I am not sure where is the best place to write this code in your application or how much preferable it would be to try to use existing the static config loading functionality in tensorflow serving as described before.
Predicting
We have talked about some of the setup to get your model loaded in the tensorflow/serving runtime. I am sure there are bunch of other things I have missed, but I do not think the story is done there.
How would you use your model for prediction ?
I am completely glossing over the gRPC server. I am sure there are many more setup necessary for you to do there.
Hoping the HTTP path would be simpler, tensorflow serving has this HttpRestApiHandler that uses a TensorflowPredictor object to call predict upon.
It is reasonable that one should expect to write a XGBoostPredictor class while adding the XBoost model platform for the first time. This would contain the XGBoost specific predict function. This is not too different compared to the need to write a custom loader to read an XGBoost model from a file.
I guess you also need to somehow extend the HttpRestApiHandler to call your XGBoostPredictor when the model is an XBoost model. And also somehow add the ability to differentiate between TensorFlowPredictor or XBoostPreditor. An obvious way to do that is not clear to me. I would be very interested to learn better appraoches. Tensorflow serving also has Life of a TensorFlow Serving inference request documentation that could be helpful.
In this discussion we have not talked about integrating into debuggability or batch processing features of tensorflow serving. Surely those also require in-depth understanding and additional work to integrate with a non-tensorflow model.
Conclusion
I think it would be immensely valuable if anyone has an open source example serving a non-tensorflow model via tensorflow/serving. I agree with the claim in their paper that tensorflow serving is extremely flexible. The non-tensorflow specific base classes for loading, version
management, batching are pretty general and flexible. However, with that extreme flexibility also comes a cost of complexity to adopt a new ML platform.
As a starting point one needs to carefully understand the example of serveables/tensorflow and expect a similar amount of complexity to host another model platform.
Implementation complexity aside, I would be extremely cautious about maintaining the new software you are going to write. It is prudent to expect to own your fork with all the libraries indefinitely in your organization, or engage with the upstream community to extend tensorflow serving. There are some prior Issues in upstream already: 1694, 768, 637.
Google ML platform has the ability to serve SKLearn and XGBoost models in addition to TensorFlow models. Their paper also says:
"In seriousness, Google uses TensorFlow- Serving for some proprietary
non-TensorFlow machine learning frameworks as well as TensorFlow. "
So similar extensions may have already been implemented on top of tensorflow serving. On the other hand, the paper was written in 2017 and who knows what else has changed since.