I think that one of your main concerns might be batching the requests. For example, let's say that your model is a trained CNN like VGG, Inception or similar. If you implement a regular web service with Flask, for each prediction request you receive (assuming you're running on GPU) you will do the prediction of a single image in the GPU, which can be suboptimal since you could batch similar requests, for example.
That's one of the things that TensorFlow Serving aims to offer, being able to combine requests for the same model/signature into a single batch before sending to GPU, being more efficient in the use of resources and (potentially) in throughput. You can find more information here: https://github.com/tensorflow/serving/tree/master/tensorflow_serving/batching
That said, it depends on the scenario very much. But batching of the predictions is something important to keep in mind.