Implement Rank without using analytic function
Asked Answered
C

3

10

I am wondering if there is a method to implement SQL analytic functions without using the inbuilt functions.

SELECT *,
    ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rownum,
    DENSE_RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS denserank,
    RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rnk
FROM emp;
Chronicles answered 20/10, 2017 at 19:44 Comment(2)
I dont want to do it in mysql. Will SELF JOIN work?Chronicles
great to see you did the legwork already on this question. You might consider accepting an answer (especially take look at Gordon Linoff's excellent one).Exceeding
C
15

Here are the three equivalent expressions:

select emp.*,
       (select count(*)
        from emp emp2
        where emp2.dept_id = emp.dept_id and
              (emp2.salary > emp.salary or
               emp2.salary = emp.salary and emp2.emp_id <= emp.emp_id
              )
       ) as "row_number",
       (select 1 + count(*)
        from emp emp2
        where emp2.dept_id = emp.dept_id and
              emp2.salary > emp.salary 
              )
       ) as "rank",
       (select count(distinct salary)
        from emp emp2
        where emp2.dept_id = emp.dept_id and
              emp2.salary >= emp.salary
       ) as "dense_rank",
from emp;

This assumes the existence of an emp_id to make the rows unique for "row_number".

Charter answered 20/10, 2017 at 20:2 Comment(5)
@teja . . . All three of these should work, regardless of whether or not there are duplicates. row_number() depends on the existence of a unique key in the table, as described in the answer.Charter
Is there an another solution to do this without use of co related sub queryAutoicous
@Autoicous . . . Questions should be asked as questions not in comments, along with sample data, desired results, and a database tag.Charter
I was looking for an authoritative answer to this precise question (since Spark Structured Streaming does not support windowing/analytic functions on a non-timestamp column) . Lo and behold from the horse's mouth Gordon LinoffExceeding
Ouch- just realized the above use correlated subqueries - which are not supported in spark structured streaming .. #55778182Exceeding
V
3

This would work for all cases

select DEPT_ID, SALARY,

    (select count(*)+1  from emp r where r.SALARY>o.SALARY and r.dept_id=o.dept_id) **rank**,

    (select count(distinct SALARY )+1  from emp r where r.SALARY>o.SALARY and r.dept_id=o.dept_id) *d_rank*,

    (select count(*)+1  from (select x.*,rownum rn from ( select emp.* from emp  order by DEPT_ID asc,salary desc ) x) r where r.rn<o.rn and r.dept_id=o.dept_id) **rownumm**

from (select x.*,rownum rn from ( select emp.* from emp  order by DEPT_ID asc,salary desc ) x) o 

order by DEPT_ID,salary desc;

for rank:- calculated using (count of (values less than current rows)+1

for dense rank:- same as rank (count distinct value less than current rows)+1

row_number:- create the nested query by generating rownum for each row which will be distinct for all rows. Now on top of that do the same logic as rank (count of values greater than previous rownum (rownum of select subquery))+1

Vaasa answered 6/2, 2019 at 5:8 Comment(2)
Please add some explanation why your code will work for all the cases. So, others can learn from it.Versed
sure Jayson I am doing itVaasa
A
2

You can do this with a correlated sub-query.

select dept_id,salary,
(select count(*) from emp e1 where e1.dept_id=e.dept_id and e1.salary>=e.salary) as rnum
from emp e

This works well when there are no ties.

Ambidexter answered 20/10, 2017 at 19:48 Comment(4)
How can I implement rownum when there are ties?Chronicles
Row number with ties is a dense rank.Resentment
@RossBush How can it be a dense rank? If you use ROW_NUMBER() OVER ( PARTITION BY ORDER BY ) it would assign unique numbers even if there are dups..Chronicles
I meant if you ordered by salary within a department and gave each member of a department a row number writhing that order, however, you kept a placeholder for ties (each tie gets the same row number), then that is row_number with ties by salary and dept_id.Resentment

© 2022 - 2024 — McMap. All rights reserved.