Using FIRST_VALUE without including inner columns in group by
Asked Answered
M

4

9

I am using a table that looks like this:

userID, eventDate, eventName
1  2019-01-01  buySoup
2  2019-01-01  buyEggs
2  2019-01-03  buyMilk
2  2019-01-04  buyMilk
3  2019-01-02  buyBread
3  2019-01-03  buyBread

My current query is:

SELECT
    userID,
    FIRST_VALUE(eventName) OVER (
        PARTITION BY userID ORDER BY eventDate ASC
    ) AS firstBought 
FROM table 
GROUP BY userID

I feel like this should return:

userID, firstBought
1  buySoup
2  buyEggs
3  buyBread

Instead, it gives the error:

'ERROR: Column "table.eventName" must appear in the GROUP BY clause or be used in an aggregate function'

Is there a way to grab this value without including it in the group by function, or creating a sub query? I'm using PostgreSQL.

If I do include it in the group by clause, it returns

userID, firstBought
1  buySoup
2  buyEggs
2  buyEggs
2  buyEggs
3  buyBread
3  buyBread

I understand that I could make it a subquery and then group by userID, firstBought, but I'd rather not create another subquery.

Moreen answered 25/2, 2019 at 21:18 Comment(0)
R
10

Instead of group by, use select distinct:

select distinct userID,
       FIRST_VALUE(eventName) over (partition by userID order by eventDate ASC) as firstBought 
from table ;

Or, you can use arrays:

select userId,
       (array_agg(eventName order by eventDate))[1] as firstBought
from table
group by userId;

Postgres doesn't have a "first" aggregation function, but this works pretty well.

Restriction answered 25/2, 2019 at 21:19 Comment(4)
Although that's useful, and works exactly as intended, my table actually also includes more columns, and my query includes max() and other aggregate functions on those columns. This means I do need the group by at the end. Is there any other solution?Moreen
@Moreen . . . This answer provides two solutions. The second one uses group by.Restriction
thank you, the second answer is great. Unfortunately, I'm actualy using Vertica Analytic Database v9.1.1-5, a branch of postgres which doesn't include array_agg. Thanks anyways!Moreen
@Moreen . . . You should correctly tag your questions.Restriction
H
4

I guess PostgreSQL's DISTINCT ON could do the trick:

SELECT DISTINCT ON (userid)
       userid, eventdate, eventname
FROM "table"
ORDER BY (eventdate);

This will give you the row per userid with the minimum eventdate.

Hasdrubal answered 26/2, 2019 at 3:31 Comment(0)
S
2

FIRST_VALUE is not an aggregate function. It is an analytic window function. So your base query does not need a GROUP BY clause. It should be re-written as:

SELECT 
        userID,
        FIRST_VALUE(eventName) over (PARTITION BY userID ORDER BY eventDate ASC) AS firstBought
FROM table;

From one of your above comments it sounds like there are other functions that you are using including aggregate functions like MAX. To accomplish what you are trying to do, you will need to use the above query as a subquery. This will allow you to use aggregate functions and get unique values from your base query. The query can look something like this (I added a price column as an example).

SELECT userID, firstBought, MAX(price)
FROM (
        SELECT userID, price, FIRST_VALUE(eventName) over (partition by userID order by eventDate ASC) as firstBought 
        from test
) x
GROUP BY userId, firstBought;

This should do the trick! You can use other aggregate functions on the outside query and an additional window functions in the subquery.

Sarcophagus answered 26/2, 2019 at 4:0 Comment(0)
G
1

I agree with A. Saunders.

You need an outside query.

With the exception of SELECT DISTINCT , which actually boils down to a GROUP BY all columns of the SELECT list, you can't mix OLAP and GROUP BY functions into the same SELECT.

So , if you do have MAX(), you have to:

WITH -- your input data ...
input(userID,eventDate,eventName) AS (
          SELECT 1,DATE '2019-01-01','buySoup'
UNION ALL SELECT 2,DATE '2019-01-01','buyEggs'
UNION ALL SELECT 2,DATE '2019-01-03','buyMilk'
UNION ALL SELECT 2,DATE '2019-01-04','buyMilk'
UNION ALL SELECT 3,DATE '2019-01-02','buyBread'
UNION ALL SELECT 3,DATE '2019-01-03','buyBread'
)
,
getfirstbought AS (
  SELECT 
    userid
  , eventdate
  , FIRST_VALUE(eventname) OVER (
      PARTITION BY userid ORDER BY eventdate
   ) AS firstbought
  FROM input
)
SELECT
  userid
, firstbought
, MAX(eventdate) AS maxdt
FROM getfirstbought
GROUP BY 1,2;
-- out  userid | firstbought |   maxdt    
-- out --------+-------------+------------
-- out       2 | buyEggs     | 2019-01-04
-- out       3 | buyBread    | 2019-01-03
-- out       1 | buySoup     | 2019-01-01
-- out (3 rows)
-- out 
-- out Time: First fetch (3 rows): 22.157 ms. All rows formatted: 22.208 ms
Gaskin answered 27/2, 2019 at 0:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.