How to use group by with union in T-SQL
Asked Answered
C

5

37

How can I using group by with union in T-SQL? I want to group by the first column of a result of union, I wrote the following SQL but it doesn't work. I just don't know how to reference the specified column (in this case is 1) of the union result.

SELECT  *
FROM    ( SELECT    a.id ,
                    a.time
          FROM      dbo.a
          UNION
          SELECT    b.id ,
                    b.time
          FROM      dbo.b
        )
GROUP BY 1
Cunnilingus answered 22/10, 2009 at 3:8 Comment(0)
R
25

GROUP BY 1

I've never known GROUP BY to support using ordinals, only ORDER BY. Either way, only MySQL supports GROUP BY's not including all columns without aggregate functions performed on them. Ordinals aren't recommended practice either because if they're based on the order of the SELECT - if that changes, so does your ORDER BY (or GROUP BY if supported).

There's no need to run GROUP BY on the contents when you're using UNION - UNION ensures that duplicates are removed; UNION ALL is faster because it doesn't - and in that case you would need the GROUP BY...

Your query only needs to be:

SELECT a.id,
       a.time
  FROM dbo.TABLE_A a
UNION
SELECT b.id,
       b.time
  FROM dbo.TABLE_B b
Riki answered 22/10, 2009 at 3:29 Comment(1)
There are use cases for wanting to use GROUP BY on the results of a union. In the example given, you may way the most recent time for each ID, whether that time is in TABLE_A or TABLE_B. You'd need a GROUP BY with MAX(time) to achieve that.Triumph
H
121

You need to alias the subquery. Thus, your statement should be:

Select Z.id
From    (
        Select id, time
        From dbo.tablea
        Union All
        Select id, time
        From dbo.tableb
        ) As Z
Group By Z.id
Homunculus answered 29/10, 2009 at 5:10 Comment(2)
it gives 'time' is invalid in the select list because it is not contained in aggregate or GROUP BY. If I have a number of fields i would have to add all of them to my GROUP BYBehling
If you are attempting to return time in the outer query, then there is a question of what you are trying to accomplish. If all you want is a unique list of id,time, then you don't need the Group By and should change the Union All to Union.Homunculus
R
25

GROUP BY 1

I've never known GROUP BY to support using ordinals, only ORDER BY. Either way, only MySQL supports GROUP BY's not including all columns without aggregate functions performed on them. Ordinals aren't recommended practice either because if they're based on the order of the SELECT - if that changes, so does your ORDER BY (or GROUP BY if supported).

There's no need to run GROUP BY on the contents when you're using UNION - UNION ensures that duplicates are removed; UNION ALL is faster because it doesn't - and in that case you would need the GROUP BY...

Your query only needs to be:

SELECT a.id,
       a.time
  FROM dbo.TABLE_A a
UNION
SELECT b.id,
       b.time
  FROM dbo.TABLE_B b
Riki answered 22/10, 2009 at 3:29 Comment(1)
There are use cases for wanting to use GROUP BY on the results of a union. In the example given, you may way the most recent time for each ID, whether that time is in TABLE_A or TABLE_B. You'd need a GROUP BY with MAX(time) to achieve that.Triumph
H
9

Identifying the column is easy:

SELECT  *
FROM    ( SELECT    id,
                    time
          FROM      dbo.a
          UNION
          SELECT    id,
                    time
          FROM      dbo.b
        )
GROUP BY id

But it doesn't solve the main problem of this query: what's to be done with the second column values upon grouping by the first? Since (peculiarly!) you're using UNION rather than UNION ALL, you won't have entirely duplicated rows between the two subtables in the union, but you may still very well have several values of time for one value of the id, and you give no hint of what you want to do - min, max, avg, sum, or what?! The SQL engine should give an error because of that (though some such as mysql just pick a random-ish value out of the several, I believe sql-server is better than that).

So, for example, change the first line to SELECT id, MAX(time) or the like!

Hideaway answered 22/10, 2009 at 3:45 Comment(0)
L
4
with UnionTable as  
(
    SELECT a.id, a.time FROM dbo.a
    UNION
    SELECT b.id, b.time FROM dbo.b
) SELECT id FROM UnionTable GROUP BY id
Lover answered 12/3, 2018 at 19:38 Comment(2)
Hi, and welcome to Stack Overflow. It would be nice if you included explanation to your code. It's hard to figure out why you decided to solve the problem this way without any explanation.Carbonado
Well, this is still a pretty good answer/solution, even without extra text, which no one else came up so far. Definately +1Jokester
C
0

You do not need a subquery.
Just try out some example queries and you will see the result.

union example

select orgid from org where orgid = 123
union
select orgid from org where orgid = 123

union all example

select orgid from org where orgid = 123
union all 
select orgid from org where orgid = 123
Cherri answered 9/2, 2024 at 7:14 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.