HiveQL: Using query results as variables
Asked Answered
D

2

5

in Hive I'd like to dynamically extract information from a table, save it in a variable and further use it. Consider the following example, where I retrieve the maximum of column var and want to use it as a condition in the subsequent query.

set maximo=select max(var) from table;

select
  *
from
  table
where
  var=${hiveconf:maximo}

It does not work, although

set maximo=select max(var) from table;

${hiveconf:maximo}

shows me the intended result.

Doing:

select '${hiveconf:maximo}'

gives

"select max(var) from table"

though.

Best

Deedeeann answered 13/5, 2016 at 9:36 Comment(0)
R
8

Hive substitutes variables as is and does not execute them. Use shell wrapper script to get result into variable and pass it to your Hive script.

maximo=$(hive -e "set hive.cli.print.header=false; select max(var) from table;")
hive -hiveconf "maximo"="$maximo" -f your_hive_script.hql

And after this inside your script you can use select '${hiveconf:maximo}'

Rencontre answered 14/6, 2016 at 20:10 Comment(2)
Is there any way to do this from hue? I want to use it in a view.Recall
@HeinduPlessis I'm not able to do the same only in Hue unfortunately. It's only possible to pass variable already calculated. using Enable parametrization checkbox, configuration, add key=varname, then it will ask to add value. Use in the query like ${varname}Rencontre
C
1

@Hein du Plessis

Whilst it's not possible to do exactly what you want from Hue -- a constant source of frustration for me -- if you are restricted to Hue, and can't use a shell wrapper as suggested above, there are workarounds depending on the scenario.

When I once wanted to set a variable by selecting the max of a column in a table to use in a query, I got round it like this:

I first put the result into a table comprising two columns, with the (arbitrary word) 'MAX_KEY' in one column and the result of the max calculation in the other, like this:

drop table if exists tam_seg.tbl_stg_temp_max_id;
create table tam_seg.tbl_stg_temp_max_id as
select
    'MAX_KEY' as max_key
    , max(pvw_id) as max_id
from
    tam_seg.tbl_dim_cc_phone_vs_web;

I then added the word 'MAX_KEY' to a sub-query then joined in the above table so I could use the result in the main query:

select
    -- *** here is the joined in value from the table being used ***
    cast(mxi.max_id + qry.temp_id as string) as pvw_id
    , qry.cc_phone_vs_web
from
    (
    select
        snp.cc_phone_vs_web
        , row_number() over(order by snp.cc_phone_vs_web) as temp_id
        -- *** here is the key being added to the sub-query ***
        , 'MAX_KEY' as max_key
    from
        (
        select distinct cc_phone_vs_web from tam_seg.tbl_stg_base_snapshots
        ) as snp
    left outer join
        tam_seg.tbl_dim_cc_phone_vs_web as pvw
        on snp.cc_phone_vs_web = pvw.cc_phone_vs_web
    where
        pvw.cc_phone_vs_web is null
    ) as qry
-- *** here is the table with the select result in being joined in ***
left outer join
    tam_seg.tbl_stg_temp_max_id as mxi
    on qry.max_key = mxi.max_key
;

Not sure if this is your scenario but maybe it can be adapted. I'm 99% sure you can't just put a select statement directly into a variable in Hue though.

If I am doing something in just Hue I would probably do the temporary table and join method. But if I were using a shall wrapper anyway I would definitely do it there.

I hope this helps.

Catechol answered 22/3, 2021 at 12:12 Comment(1)
Yeah, I'm using Hue too, it's frustrating!Crusted

© 2022 - 2024 — McMap. All rights reserved.