Are there any performance benchmark(genuine ones) that compare Stinger vs Impala vs Drill? Also, which is preferred - my use case will be mainly towards ad-hoc interactive queries on top of Hive. Thanks.
There are some performance numbers on the site http://allegro.tech/fast-data-hackathon.html.
In general, we see Drill and Impala are comparable in performance for the interactive queries with the differentiation of Drill being its ability to query without metadata definitions and its ease of use working with JSON data.
Note that these tests are on much older versions on Drill such as 0.8/0.9 (also not configured appropriately for data locality). Now Drill is 1.1 with a lot of improvements on SQL (window functions etc) and performance.
You cannot do benchmark like this, it's no sense and you should never trust a such benchmark.
Everything will depend on your own data, you have JSON files ? prefer Drill. You want to query more than 1TB, prefer Hive and so on.
Also, you may consider file format, JSON, Kudu, Parquet or ORC.
Then come the optimization, Hive+Tez seems better for parrarel queries but very slow for single query. Whereas Impala is the opposite (MapReduce versus MassiveParrarelProcessing).
Also, you want to consider the hardware ressource, disk SSD or not etc..
I recommend, start with Apache Drill + JSON file, then try Apache Drill with Parquet or ORC.
If you want help, describe exactly what you have (data + hardware) and what you want.
© 2022 - 2024 — McMap. All rights reserved.