I have been trying to write an R script to query Impala database. Here is the query to the database:
select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from databaseB.tableB ) group by columnA order by columnA
When I run this query manually (read: outside the Rscript via impala-shell), I am able to get the table contents. However, when the same is tried via the R script, I get the following error:
[1] "HY000 140 [Cloudera][ImpalaODBC] (140) Unsupported query."
[2] "[RODBC] ERROR: Could not SQLExecDirect 'select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from databaseB.tableB ) group by columnA order by columnA'
closing unused RODBC handle 1
Why does the query fail when tried via R? and how do I fix this? Thanks in advance :)
Edit 1:
The connection script looks as below:
library("RODBC");
connection <- odbcConnect("Impala");
query <- "select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from databaseB.tableB ) group by columnA order by columnA";
data <- sqlQuery(connection,query);
EXEC
. It turned out that the problem was nested INSERT-EXEC, almost as though RODBC had been inserting an extra layer of INSERTs, so my stored procedure worked fine outside R but fell apart in R. Did you ever end up solving your problem? – Underbody