Is Apache Zeppelin stable enough to be used in Production
Asked Answered
W

3

5

I am using AWS EMR cluster. I have been experimenting with Spark Drivers and Apache Zeppelin Rest APIs to run jobs. I have run several hundred adhoc jobs with Zeppelin and didn't have any concern. With that fact I am considering to use Zeppelin Rest APIs in production. Will be submitting jobs using Rest APIs.

Has anyone experienced stability issues with Zeppelin in Production?

Wychelm answered 16/3, 2017 at 2:39 Comment(0)
M
8

I have a zeppelin running in production in a multiuser environment (+/- 15 users) and it hasn't been very stable. To make it more stable I run zeppelin on its own node, not any longer on the master node.

Anyway, I found the following problems:

  • In the releases before 0.7.2 Zeppelin created a lot of zombie processes, which causes memory problems after heavy usage.
  • User libraries can break Zeppelin, this has been the case in the versions prior 0.7.0. E.g. Jackson libraries make Zeppelin unable to communicate with the spark interpreter. In 0.7.0 and up this problem has been mitigated.
  • There are random freezes when there are a lot of users. The only way to fix this, is a restart of the service. (All versions)
  • Sometimes when a user starts his interpreter and the local repo is empty, zeppelin doesn't download all the libraries specified in the interpreter config. Then it won't download them again, the only way to mitigate this is to delete the contents of the local repo of the interpreter. (All versions)
  • Sometimes changes on notebooks don't get saved, which causes users to loose code.
  • In version 0.6.0 spark interpreters shared a context, which caused users to overwrite each other variables.
  • Problems are difficult to debug, the logging is not that great yet. Some bugs seem to break the logging and sometimes running an interpreter in debug mode fixes the problem.

So, I wouldn't put it in a production setting yet, where people depend on it. But for testing and data discovery it would be fine. Zeppelin is clearly still in a beta stage.

Also don't run it on the master node, but setup your own instance and let it connect remotely to the cluster. This makes it much more stable. Put it on a beefy node and restart it overnight.

Most of the bugs I encountered are already on the Jira and the developers are working hard to make things better. The stability becomes better and better every release and I see the maintenance load going down every version, so it certainly has potential.

Mccaskill answered 20/6, 2017 at 9:5 Comment(2)
I have been using Zeppelin in "baby production" as well, and I can say that these comments here still seem to hold true. I am regretting having chosen it as our medium of sharing analyses.Michaels
Does this comment, and the main answer still hold true today? I feel like I haven't seen the expected uptick in Zeppelin use, and I'm wondering if this is part of the reason?Momism
W
2

I have used zeppelin now for more than a year. It gets you going quickly when you are just starting but it is not a good candidate for production use cases and especially with more than 10 users and it depends on your cluster resources. These were my concerns overall with Zeppelin.

  1. By default you can't have more than one job running at a time, you will need to change the configuration to make that happen.
  2. If you are loading additional libraries from s3 or external environments, you can do that only in the beginning or you will have to restart zeppelin.
  3. spark context is pre-created and there are only few settings you can make changes to.
  4. The editor itself doesn't resize well when your output is large.

I am moving on to jupyter for my use cases which is much strong in my initial assessment.

Wychelm answered 25/8, 2018 at 2:5 Comment(0)
E
1

As of the time of this answer, end of February 2019, my answer would be : NO.
Plain and Simple. Zeppelin keeps crashing, hanging and getting unresponsive, notebooks tend to get unloadable due to size errors, very slow execution compared to Jupyter, plus so many limitations regarding third party displaying engines integration (although many effort have been made towards this).

I experienced these issues on a decently sized and capacited cluster, with a single user. I would never, ever, advice it to be a production tool. Not as it is today to the least. Unless you have an admin at hand able to restart the whole thing regularly and track down/fix errors and be in charge of integration.

We moved back to Jupyter, and everything worked smoothly out-of-the box from day one, after struggling to stabilize Zeppelin for weeks.

Enamour answered 24/2, 2019 at 18:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.