ML serving using either kserve seldon or bentoml

I'm in a similar position where lately I've been looking around the model serving landscape to choose what stack/tech go for. Currently we're using FastAPI to wrap models into microservices, but we want to split IO/Network-bound consumption (usually from business logic) from compute/memory-bound consumption (models), and also better orchestration (scaling, traffic distribution for A/B tests, etc).

Generally you have two kinds of tools:

Inference servers, which deal with wrapping the model into a microservice
Servers orchestrators, which add orchestrating features for scaling, deploying and generally managing the server fleet

BentoML is a model server, and the direct comparison wouldn't be to Seldon Core or KServe, but rather's Seldon Core's MLServer/Python Client and KServe's KFModel (which in turn uses Ray). I feel like their feature set is very similar, so which one is best depends on experience/trial and error. Personally I went for BentoML at this moment because it seemed the easiest to iterate on, but I wouldn't exclude switching to the others if Bento doesn't work as well.

Seldon Core and KServe are more orchestration tools, meaning that their feature set, while including inference servers, also extends beyond that. BentoML also has an orchestration tool, Yatai, but I feel like it's still lacking in features compared to the above two. The good news is that I believe Seldon Core and KServe should work with most inference servers tech (namely BentoML), although some features might be degraded compared to using their own solutions.

I don't have a clear cut answer to which one is best and from my research it seems people seem to use all of them in some form or another, like:

BentoML + Helm for deployment
BentoML + Seldon Core
Seldon's prepackaged inference servers/custom + Seldon Core
BentoML + KServe

My personal suggestion is to try out the quickstart tutorials of each and see what fits best your needs, generally going for the path of least resistance - the MLOps landscape changes a lot and quickly, some tools are more mature than others, so not investing too much in a hard tool makes most sense to me.

Recommended topics

Hot tags