What are the fundamental architectural, SQL compliance, and data use scenario differences between Presto and Impala?
Asked Answered
B

1

5

Can some experts give some succinct answers to the differences between Presto and Impala from these perspectives?

  1. Fundamental architecture design
  2. SQL compliance
  3. Real-world latency
  4. Any SPOF or fault-tolerance functionality
  5. Structured and unstructured data use scenario performance
Burning answered 7/11, 2013 at 16:16 Comment(1)
Ok, since no one would be able to answer this question. I would like to add some comments from my own findings. The largest difference I can see so far (maybe not very accurate due to the scarcity of Presto paper): Impala uses a push-down approach while Presto uses a connector approach, which means Impala runs the optimized fragmented queries on the node where the data resides in the HDFS system while Presto connector approach runs more or less like HAWQ or SQL-H by importing the data from HDFS to the query engine.Burning
H
2

Apache Impala is a query engine for HDFS/Hive systems only.

PrestoDB, as well as the community version Trino, on the other hand are a generic query engine, which support HDFS as just one of many choices. There is a long list of connectors available, Hive/HDFS support is just one of them. This also means that you can query different data source in the same system, at the same time.

Hare answered 31/1, 2020 at 22:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.