Immediate evaluation of CTE
Asked Answered
B

2

6

I am trying to optimize a very long and complex impala query which contains multiple CTE. Each CTE is used multiple times. My expectation is that once a CTE is created, I should be able to direct impala that results of this CTE should be re-used in main query as-is instead of SCAN HDFS operation on tables involved in CTE again with the main query. Is this possible? if yes how ?

I am using impalad version 2.1.1-cdh5 RELEASE (build 7901877736e29716147c4804b0841afc4ebc9037) version

Bedpost answered 6/11, 2017 at 9:26 Comment(2)
might be a stupid question but what is CTEDonoghue
@shainnif I am assuming its Impala Common Table ExpressionsSerrulation
S
3

I do not think so. I believe WITH clause does not create any permanent object, it's just for you avoid cluttering the namespace with new tables or views and to make it easier to refactor large, complex queries by reor‐ dering and replacing their individual parts. The queries used in the WITH clause are good candidates to someday become views or to be materialized as summary tables during the ETL process.

Serrulation answered 17/11, 2017 at 5:33 Comment(0)
P
2

Is this possible?

The very purpose of the CTE is to re-use of the results obtained from a preceding query (that uses the with clause) by a following query, say, SELECT. So i don't see a reason why it is not possible.

Use Explain on your query to find out the actual SCAN HDFS details.

For more I/O related insights use profile as in the official documentation https://www.cloudera.com/documentation/enterprise/5-7-x/topics/impala_explain_plan.html#perf_profile

Philologian answered 17/11, 2017 at 9:1 Comment(1)
The primary purpose of a CTE is to reuse a query, not to capture a specific set of results gathered from running said query.Firmament

© 2022 - 2024 — McMap. All rights reserved.