Running impala cluster from portable binaries
Asked Answered
H

1

8

I'm evaluating multiple big data tools. One of them is of course Impala.
I would like to start Impala cluster by manually starting processes on the cluster nodes. As I'm currently doing for Spark, H2O, Presto and Dask, I would like to grab binaries, copy to nodes, edit configs, and start services on nodes from shell. This works very well, it's straightforward to upgrade and I can easily move to bigger/different clusters when needed. Unfortunately I cannot find resources on proper way for starting required services (Catalog Server, StateStore and daemons) from shell.
I assume it is obvious task but just cannot find a proper example to follow, so my question is how can I start Impala cluster from shell calling Impala binaries?

Huihuie answered 22/8, 2016 at 20:3 Comment(3)
What about the "Starting Impala" section on the official Apache site? cloudera.com/documentation/enterprise/latest/topics/…Orchestral
Disclaimer:I never bothered to try a standalone install, or a manual start-up. Cloudera Manager does a decent job of, well, managing the whole thing (and restarting automagically the daemons whenever they crash -- which is a funny experience in itself, I hadn't seen the dreadful Unix SEGV fault in about 20 years!)Orchestral
@SamsonScharfrichter This doesn't seems to be related to my question. If it isn't clear I can add examples of how I run Spark, H2O, Presto or Dask. Just a shell command against downloaded and unpacked binaries. No installation or OS-level services. I agree on a decent job by Cloudera Manager, but I need to run various versions on various environments, reinstalling impala/CDH, upgrading, downgrading does not seem to be a way to go.Huihuie
L
2

....I would like to start Impala cluster by manually starting processes on the cluster nodes.....how can I start Impala cluster from shell calling Impala binaries?

I guess this is what you are looking for: http://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_processes.html#starting_via_cmdline

Update 1:

You may want to pick only the required info from this link: http://doc.mapr.com/plugins/servlet/mobile#content/view/28869628

It has steps to build impala from github, to be run on mapr.

Update 2:

To build Impala, Check these links: https://github.com/cloudera/Impala/wiki/Build-prerequisites https://github.com/cloudera/Impala/wiki/How-to-build-Impala https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala

Update 3:

For expert suggestions on portability please get in touch with:

Impala developers: [email protected]

Taken from http://impala.apache.org/community.html

Time being, you may consider to build the binaries for the readily available linux production environment.

Some more useful links for your situation:

https://cwiki.apache.org/confluence/display/IMPALA/Tips+for+Faster+Impala+Builds https://cwiki.apache.org/confluence/display/IMPALA/Building+native-toolchain+from+scratch+and+using+with+Impala

Loaning answered 30/8, 2016 at 15:2 Comment(3)
Thanks Marco, now it is a insightful answer, but not really practical. Is there a possibility that building impala from source will impact its performance? The whole point is to add impala into "easily" reproducible benchmark db-benchmark, so both installing CDH or building from source will not address portability that I'm looking for.Huihuie
@Huihuie : Though I personally have not tried building from source, I don't think there will be an impact on its performance (since it is largely related to the cluster configuration, resources, and load). I guess the universal portability you are trying to achieve is impractical due to the presence of C++ components that would require machine-native compilers to create respective binaries.Loaning
@Huihuie : By any chance you got any other inputs on achieving universal portability? Please share if you did.Loaning

© 2022 - 2024 — McMap. All rights reserved.