How to calculate median in Hive
Asked Answered
B

3

33

I have a hive table,

name    age     sal
A       45      1222
B       50      4555
c       44      8888
D       78      1222
E       12      7888
F       23      4555

I want to calculate median of age column.

Below is my approach

select min(age) as HMIN,max(age) as HMAX,count(age) as HCount,
IF(count(age)%2=0,'even','Odd') as PCOUNT 
from v_act_subjects_bh;

Appreciate any query suggestion

Bronk answered 11/11, 2014 at 10:51 Comment(0)
I
91

You can use the percentile function to compute the median. Try this:

select percentile(cast(age as BIGINT), 0.5) from table_name
Inrush answered 11/11, 2014 at 11:14 Comment(2)
What if I had a bunch of double values isnstead of ints? @InrushJeb
@Jeb hive offers: percentile_approx(DOUBLE col, p [, B])Schaumberger
B
0

Accepted answer works if you have INT values. If your data contains values between 0-1 such as scores of a model, you may use below formula;

select (percentile(cast(age as BIGINT), 0.5))/100 from table_name
Biyearly answered 22/11, 2021 at 19:38 Comment(1)
aren't you casting it as a BIGINT so wouldn't that negate what you're sayingFrankfurter
K
-3
double median = 0;
double term = 0;
double term1 = 0;
if (size % 2 == 1)
{
    term = (size + 1 - 1) / 2;
    median = term;
}
else if (size % 2 == 0)

{
    
    term1 = (size - 1) / 2;
    term1 = term1 + ((size - 1) / 2) + 1;
    term1 = term1 / 2;
    median = term1;
}
cout << "Median of array: " << median << endl;
Kalagher answered 26/10, 2020 at 7:22 Comment(2)
this is not hiveDunning
Hi @Success, try to answer C++ questions, this here is one only for Hive. On Stackoverflow is very important the TAG of the question. You can DELETE your answer here to avoid more down-votes... Seems that you was not looking for Hive, but for C++ or other.Mirtamirth

© 2022 - 2024 — McMap. All rights reserved.