impala Questions

5

I need to deploy Big Data Cluster on our servers. But I just know about knowledge of Apache Spark. Now I need to know whether Spark SQL can completely replace Apache Impala or Apache Hive. I...
Kaiulani asked 25/10, 2016 at 9:37

7

I have been trying to write an R script to query Impala database. Here is the query to the database: select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) f...
Spindly asked 11/5, 2015 at 12:46

3

Solved

Usually in Impala, we use the COMPRESSION_CODEC before inserting data into a table for which the underlying files are in Parquet format. Commands used to set COMPRESSION_CODEC: set compression_c...
Bordure asked 20/8, 2019 at 12:16

4

Let's suppose we have a table: Owner | Pets ------------------------------ Jack | "dog, cat, crocodile" Mary | "bear, pig" I want to get as a result: Owner | Pets -----------------------------...
Lashawnda asked 23/5, 2016 at 19:38

2

Here are the steps to the current process: Flafka writes logs to a 'landing zone' on HDFS. A job, scheduled by Oozie, copies complete files from the landing zone to a staging area. The staging da...
Experienced asked 25/1, 2016 at 23:54

4

Could you highligh major differences between the two in architecture & functionality in 2019? And how that differences affect performance? For some reason this excellent question was tagged as...
Lathe asked 10/12, 2019 at 21:38

3

Solved

How can I save my query results in a CSV file via the Impala Shell. My Code: impala-shell -q "use test; select * from teams; -- From this point I need to save the query results to /Desktop (for e...
Underling asked 14/4, 2018 at 16:4

3

I saw at this link which affects Impala version 1.1: Since Impala 1.1, REFRESH statement only works for existing tables. For new tables you need to issue "INVALIDATE METADATA" statement. Does ...
Shan asked 15/2, 2017 at 1:24

2

Solved

I need to add parameters in several locations in a long query. I want to use parameters because I need to run the query multiple times with different values substituted in. This is very cumbersome ...
Disquisition asked 8/6, 2020 at 20:27

3

I'm using SQL in Impala to write this query. I'm trying to convert a date string, stored in YYYYMMDD format, into a date format for the purposes of running a query like this: SELECT datadate, se...
Solingen asked 8/10, 2015 at 19:24

2

Solved

This query returns in 10 seconds most of the times, but occasionally it need 40 seconds or more. There are two executer nodes in the swarm, and there is no remarkable difference between profiles of...
Erlandson asked 14/8, 2020 at 2:34

1

Solved

I find that my Impala swarm performs not stable, normally it takes only a few seconds (less than 10s) to finish a query, but occasionally it will take more than 40s (and this situation will last fo...
Libbey asked 14/8, 2020 at 2:54

2

How can I extract the date from a timestamp value variable in Impala? eg time = 2018-04-11 16:05:19 should be 2018-04-11
Collide asked 24/6, 2018 at 20:19

1

We have a Hadoop-based solution (CDH 5.15) where we are getting new files in HDFS in some directories. On top os those directories we have 4-5 Impala (2.1) tables. The process writing those files i...
Orangery asked 6/2, 2020 at 8:24

1

Can some experts give some succinct answers to the differences between Presto and Impala from these perspectives? Fundamental architecture design SQL compliance Real-world latency Any SPOF or f...
Burning asked 7/11, 2013 at 16:16

3

I have a CSV data with each field surronded with double quotes. When I created Hive table used serde 'com.bizo.hive.serde.csv.CSVSerde' When above table is queried in Impala I am getting error SerD...
Flambeau asked 3/9, 2014 at 10:56

5

Solved

I have recently started looking into querying large sets of CSV data lying on HDFS using Hive and Impala. As I was expecting, I get better response time with Impala compared to Hive for the queries...
Yapon asked 26/5, 2013 at 2:7

3

Solved

I'm working on a NRT solution that requires me to frequently update the metadata on an Impala table. Currently this invalidation is done after my spark code has run. I would like to speed things u...
Tomcat asked 6/7, 2016 at 9:29

2

I'm using R shiny and dplyr to connect to a database and query the data in Impala. I do the following. con <- dbPool(odbc(), Driver = [DIVER], Host = [HOST], Schema = [SCHEMA], Port = [PORT], U...
Muscle asked 12/8, 2019 at 19:3

4

Solved

I have a use case where I need to use ROW_NUMBER() over PARTITION: Something like: SELECT Column1 , Column 2 ROW_NUMBER() OVER ( PARTITION BY ACCOUNT_NUM ORDER BY FREQ, MAN, MODEL) as LEVEL FR...
Stephie asked 6/10, 2014 at 19:20

0

I have some data that is processed and model based on case classes, and the classes can also have other case classes in them, so the final table has complex data, struct, array. Using the case clas...
Insufferable asked 26/3, 2019 at 17:33

3

Solved

As has been discussed in impala tutorials, Impala uses a Metastore shared by Hive. but has been mentioned that if you create or do some editions on tables using hive, you should execute INVALIDATE ...
Kindergartner asked 24/11, 2015 at 7:54

3

Solved

I try to query hbase data through hive (I'm using cloudera). I did a fiew hive external table pointing to hbase but the thing is Cloudera's Impala doesn't have an access to all those tables. All hi...
Minuet asked 10/12, 2013 at 16:44

2

Solved

Are numeric columns recommended for partition keys? Will there be any performance difference when we do a select query on numeric column partitions vs string column partitions?
Reconvert asked 29/8, 2018 at 16:24

2

I'm trying to create a table in Impala from a CSV that I've uploaded into an HDFS directory. The CSV contains values with commas enclosed inside quotes. Example: 1.66.96.0/19,"NTT Docomo,INC.","...
Marj asked 7/6, 2016 at 19:57

© 2022 - 2024 — McMap. All rights reserved.