I have a zeppelin running in production in a multiuser environment (+/- 15 users) and it hasn't been very stable. To make it more stable I run zeppelin on its own node, not any longer on the master node.
Anyway, I found the following problems:
- In the releases before 0.7.2 Zeppelin created a lot of zombie processes, which causes memory problems after heavy usage.
- User libraries can break Zeppelin, this has been the case in the versions prior 0.7.0. E.g. Jackson libraries make Zeppelin unable to communicate with the spark interpreter. In 0.7.0 and up this problem has been mitigated.
- There are random freezes when there are a lot of users. The only way to fix this, is a restart of the service. (All versions)
- Sometimes when a user starts his interpreter and the local repo is empty, zeppelin doesn't download all the libraries specified in the interpreter config. Then it won't download them again, the only way to mitigate this is to delete the contents of the local repo of the interpreter. (All versions)
- Sometimes changes on notebooks don't get saved, which causes users to loose code.
- In version 0.6.0 spark interpreters shared a context, which caused users to overwrite each other variables.
- Problems are difficult to debug, the logging is not that great yet. Some bugs seem to break the logging and sometimes running an interpreter in debug mode fixes the problem.
So, I wouldn't put it in a production setting yet, where people depend on it. But for testing and data discovery it would be fine. Zeppelin is clearly still in a beta stage.
Also don't run it on the master node, but setup your own instance and let it connect remotely to the cluster. This makes it much more stable. Put it on a beefy node and restart it overnight.
Most of the bugs I encountered are already on the Jira and the developers are working hard to make things better. The stability becomes better and better every release and I see the maintenance load going down every version, so it certainly has potential.
Zeppelin
in "baby production" as well, and I can say that these comments here still seem to hold true. I am regretting having chosen it as our medium of sharing analyses. – Michaels