Presto with Kubernetes
Asked Answered
S

2

7

We are trying to implement Presto with Kubernetes. We have a kubernetes cluster running on cloud as a service. I tried to google on this but could not find a conclusive result as to what may be the best practices to deploy Presto with Kubernetes. Though there exists the official github of Presto - but does not help. Below are the two questions I am trying to seek an answer for:

  1. What should be the best approach to configure Presto with Kubernetes - metrics such as ideal worker replicas?
  2. How can we go ahead and performance test this deployment?
Seedcase answered 11/9, 2018 at 8:21 Comment(0)
J
11

You could install with the official helm chart from https://github.com/helm/charts/tree/master/stable/presto It provides an option to set the number of workers. With the official chart you should be able to ask questions in the Kubernetes charts slack channel (through http://slack.k8s.io) and raise issues in GitHub if you hit any. Or there are non-helm examples such as https://github.com/dharmeshkakadia/presto-kubernetes

The question of how many workers isn't specific to Kubernetes. It's a question of how much and what kind of load you will need the deployment to handle and will also depend on what hardware your Kubernetes cluster is using. If you're not sure then perhaps you can deploy with the defaults and adjust as needed. This is suggested by https://prestodb.io/presto-admin/docs/current/installation/presto-configuration.html You'll find some of the settings such as memory per node set in the Deployment parts of the kubenernetes yaml descriptors or in the values.yaml in the case of the helm chart.

To performance test your deployment you will need test data and can then run queries against the cluster. So the same process you would follow outside of Kubernetes. There are tools to help such as https://www.lewuathe.com/use-benchto-for-evaluation-of-presto.html or https://github.com/prestodb/tempto You may also want to look at https://kognitio.com/blog/presto-performance-powerful-or-problematic/

January answered 11/9, 2018 at 9:10 Comment(3)
Great point re the Helm chart, I've overlooked that ;) I'm still wondering if a deployment is a good option here …Intercrop
Yeah it does seem odd that the examples all use Deployment instead of Statefulset. I guess it depends whether the coordinator cares specifically which worker pod is doing what and whether it needs to then interact with that specific pod. I don't know presto well enough to say.January
@MichaelHausenblas I believe deployment will be a better option here since Presto is meant to query and not exactly store. So for example, we already have Drill being used heavily and it works only as DeploymentSeedcase
I
2

There are a couple of examples of how it could be achieved available, for example dharmeshkakadia/presto-kubernetes but I guess you might want to use a StatefulSet here, rather. Not sure concerning perf tests because much of it will depend on the kind of persistent volume you choose or better say by what it is backed, for example NFS, Ceph, or maybe you are in a cloud environment with native storage?

Intercrop answered 11/9, 2018 at 8:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.