SQL - using alias in Group By
Asked Answered
C

12

207

Just curious about SQL syntax. So if I have

SELECT 
 itemName as ItemName,
 substring(itemName, 1,1) as FirstLetter,
 Count(itemName)
FROM table1
GROUP BY itemName, FirstLetter

This would be incorrect because

GROUP BY itemName, FirstLetter 

really should be

GROUP BY itemName, substring(itemName, 1,1)

But why can't we simply use the former for convenience?

Corrody answered 1/10, 2010 at 16:46 Comment(4)
that's allowed in PostgresqlOriginality
MySQL allows it alsoHauge
which rdbms you are talking about?Davita
This question is not exclusive to group by, but also to join, for exampleWhish
T
399

SQL is implemented as if a query was executed in the following order:

  1. FROM clause
  2. WHERE clause
  3. GROUP BY clause
  4. HAVING clause
  5. SELECT clause
  6. ORDER BY clause

For most relational database systems, this order explains which names (columns or aliases) are valid because they must have been introduced in a previous step.

So in Oracle and SQL Server, you cannot use a term in the GROUP BY clause that you define in the SELECT clause because the GROUP BY is executed before the SELECT clause.

There are exceptions though: MySQL and Postgres seem to have additional smartness that allows it.

Timid answered 1/10, 2010 at 17:53 Comment(11)
I like this explanation. Although I can't speculate how difficult it is to add it to an engine as syntactic sugar.Corrody
Any idea if the DB is smart enough to realize the same expression is in the SELECT and GROUP BY clauses without re-evaluating the expressions? i.e. if there is GROUP BY substring(itemName, 1,1), is the database smart enough not to take the performance hit of recomputing the substring in the SELECT clause?Hauge
In the SELECT clause of a query with grouping, you only have access to the GROUP BY expressions and aggregated values. So it's not about being smart; it has to be implemented that way for the grouping to work. (And it's required by the SQL standard). But even in more trivial cases (e.g. the same expression in the WHERE and the SELECT clause), state-of-the-art database systems will certainly only compute it once. This optimization is called common sub-expression elimination.Timid
What does execution order have to do with the question? It isn't like the asker was trying to GROUP BY on the COUNT(). In fact, the query as asked works just fine in MySQL and likely PostgreSQL as pointed out in the comments.Trinitrotoluene
For mysql, sql_mode not including ONLY_FULL_GROUP_BY in the bitmask, the Optimizer has a chance to deliver better results with a varied / different use of the alias in the HAVING clause.Darren
Given the parse order, if only you could alias calculations/aggregations in the WHERE or GROUP BY clause. Then you could do SELECT itemName as ItemName, FirstLetter, Count(itemName) as n FROM table1 GROUP BY itemName, substring(itemName, 1,1) as FirstLetter Do any DB engines support this modified syntax?Crusted
I do not agree with you. If I use numbers in GROUP BY 1,2 I refer to SELECT columnsCa
A good explanation for the logical processing order of SQL: A Beginner’s Guide to the True Order of SQL OperationsOmophagia
Yes, but part of any compiler is parsing/pre-parsing so having it perform string replacement of aliases is just syntactical sugar that would make life so much easier for people and not effect the execution order and any artifacts such as inner select column names/aliases can be addressed by generating ambiguous name errors.Clamor
Sybase ASA also supports using alias names in group by clauses. Very frustrating that any DBMS does not support this today. Lack of support means writing a column definition in two places - when that is a complex definition it renders the code less readable and makes it very easy for future updates to the code to break it - failing to update the column definition in all places. This is bang-your-head-against-a-brick-wall stuff.Ampere
The question itself could (and should) be rephrased, in light of the correct explanation as you gave here. Namely: It should be possible to give aliases to expressions in the GROUP BY clause, and then to reference those aliases in SELECT. There is no logical obstacle to doing it this way; probably not enough people are asking for it.Samal
S
36

You could always use a subquery so you can use the alias; Of course, check the performance (Possible the db server will run both the same, but never hurts to verify):

SELECT ItemName, FirstLetter, COUNT(ItemName)
FROM (
    SELECT ItemName, SUBSTRING(ItemName, 1, 1) AS FirstLetter
    FROM table1
    ) ItemNames
GROUP BY ItemName, FirstLetter
Schrick answered 1/10, 2010 at 16:55 Comment(3)
Sub-queries should be avoided where ever possible due to bad performance. Using a copy of the function is much better because it is of course detected by the database optimiser and done only once.Suspicion
@Suspicion but there is not different in execution plan in that case. Is any other performance consideration?Teakettle
@Roland, Correlated sub-queries or other syntax that leads to loops or row-by-row behavior should be avoided, and there is a limit to how deep you should go with nested subqueries, but it's generally not true that sub-queries lead to bad performance. In this case as Chris has said, you can verify the execution plan (AKA query plan, explain plan) comparing both with and without the subquery, and see if there's really any difference. Pretty much every database engine will re-write your query so you aren't totally in control of what gets executed. That's the point of declarative syntax.Rosen
M
26

At least in PostgreSQL you can use the column number in the resultset in your GROUP BY clause:

SELECT 
 itemName as ItemName,
 substring(itemName, 1,1) as FirstLetter,
 Count(itemName)
FROM table1
GROUP BY 1, 2

Of course this starts to be a pain if you are doing this interactively and you edit the query to change the number or order of columns in the result. But still.

Melanous answered 1/10, 2010 at 16:49 Comment(4)
GROUP BY FirstLetter is allowed in Postgresql. To wit, try running this in Postgresql: select substring(table_name,1,2) as tname from information_schema.tables group by tnameOriginality
@MichaelBuen Seems potentially problematic to me. From a quick test it looks as though if there is an alias and a base table column with the same name the latter gets priority? SQL Fiddle. So if relying on this group by alias a later schema change could silently break your query and change the semantics.Hessite
@MartinSmith only knew now that is a gotcha, will refrain from using that, thanks. Given that PostgreSQL allows that shortcut, they should give the alias a priority, otherwise they should not allow that shortcut at all.Originality
This was a terrible idea by the PostgreSQL designers. It's confusing as soon as you try to GROUP BY any expression that contains aggregate functions or window functions, which "obviously" doesn't work.Crowns
F
17

I'm not answering why it is so, but only wanted to show a way around that limitation in SQL Server by using CROSS APPLY to create the alias. You then use it in the GROUP BY clause, like so:

SELECT 
 itemName as ItemName,
 FirstLetter,
 Count(itemName)
FROM table1
CROSS APPLY (SELECT substring(itemName, 1,1) as FirstLetter) Alias
GROUP BY itemName, FirstLetter
Flan answered 13/12, 2016 at 18:59 Comment(0)
S
16

SQL Server doesn't allow you to reference the alias in the GROUP BY clause because of the logical order of processing. The GROUP BY clause is processed before the SELECT clause, so the alias is not known when the GROUP BY clause is evaluated. This also explains why you can use the alias in the ORDER BY clause.

Here is one source for information on the SQL Server logical processing phases.

Suboceanic answered 1/10, 2010 at 17:10 Comment(0)
F
6

Caution that using alias in the Group By (for services that support it, such as postgres) can have unintended results. For example, if you create an alias that already exists in the inner statement, the Group By will chose the inner field name.

-- Working example in postgres
select col1 as col1_1, avg(col3) as col2_1
from
    (select gender as col1, maritalstatus as col2, 
    yearlyincome as col3 from customer) as layer_1
group by col1_1;

-- Failing example in postgres
select col2 as col1, avg(col3)
from
    (select gender as col1, maritalstatus as col2,
    yearlyincome as col3 from customer) as layer_1
group by col1;
Flowerdeluce answered 23/4, 2016 at 13:13 Comment(0)
H
5

Some DBMSs will let you use an alias instead of having to repeat the entire expression.
Teradata is one such example.

I avoid ordinal position notation as recommended by Bill for reasons documented in this SO question.

The easy and robust alternative is to always repeat the expression in the GROUP BY clause.
DRY does NOT apply to SQL.

Horny answered 1/10, 2010 at 16:53 Comment(0)
M
2

Beware of using aliases when grouping the results from a view in SQLite. You will get unexpected results if the alias name is the same as the column name of any underlying tables (to the views.)

Modred answered 28/4, 2015 at 15:29 Comment(0)
E
0

Back in the day I found that Rdb, the former DEC product now supported by Oracle allowed the column alias to be used in the GROUP BY. Mainstream Oracle through version 11 does not allow the column alias to be used in the GROUP BY. Not sure what Postgresql, SQL Server, MySQL, etc will or won't allow. YMMV.

Egon answered 1/10, 2010 at 16:56 Comment(0)
P
0

In at least Postgres, you can use the alias name in the group by clause:

SELECT itemName as ItemName1, substring(itemName, 1,1) as FirstLetter, Count(itemName) FROM table1 GROUP BY ItemName1, FirstLetter;

I wouldn't recommend renaming an alias as a change in capitalization, that causes confusion.

Pell answered 23/9, 2020 at 16:43 Comment(0)
L
0

Use backtick character ( ` )

This works for me in MySQL (it can also be used in HAVING clause):

 SELECT 
  itemName as ItemName,
  substring(itemName, 1,1) as FirstLetter,
  Count(itemName)
 FROM table1
 GROUP BY `itemName`, `FirstLetter`
Lysippus answered 19/12, 2023 at 21:2 Comment(0)
L
0

Oracle 23c will bring this feature. For now it is not possible.

Lovieloving answered 17/3 at 15:33 Comment(1)
Oracle database 23c is already released, isn't it? Why do you write: will bring this feature ?Brainpan

© 2022 - 2024 — McMap. All rights reserved.