Using distinct on a column and doing order by on another column gives an error
Asked Answered
G

11

21

I have a table: abc_test with columns n_num, k_str.

This query doesnt work:

    select distinct(n_num) from abc_test order by(k_str)

But this one works:

    select n_num from abc_test order by(k_str)

How do DISTINCT and ORDER BY keywords work internally that output of both the queries is changed?

Grande answered 18/1, 2012 at 6:1 Comment(2)
Could you show the results if you run those two queries? The first query should return error ORA-01791: not a SELECTed expression as the k_str column is not being selected. Is this the actual query you ran? (It may be acceptable in 9i, but I can't be sure.)Suzisuzie
Same effect on SQL Server but with a more descriptive error message, "ORDER BY items must appear in the select list if SELECT DISTINCT is specified."Precipitant
O
17

As far as i understood from your question .

distinct :- means select a distinct(all selected values should be unique). order By :- simply means to order the selected rows as per your requirement .

The problem in your first query is For example : I have a table

ID name
01 a
02 b
03 c
04 d 
04 a

now the query select distinct(ID) from table order by (name) is confused which record it should take for ID - 04 (since two values are there,d and a in Name column). So the problem for the DB engine is here when you say order by (name).........

Overeager answered 18/1, 2012 at 8:6 Comment(1)
Why is this the chosen answer? What user is asking is: order all these rows by name, so the ID returned are going to be: 01, 04, 02, 03, 04. Then dedupe the results which gives 01, 04, 02, 03. The usual use case for Distinct + Order by is to select the top/bottom of something. For example: select the names of 100 most recent applicants. Hence, distinct name order by application_time.Ruminate
A
9

You might think about using group by instead:

select n_num
from abc_test
group by n_num
order by min(k_str)
Arnulfo answered 18/1, 2012 at 8:10 Comment(3)
@FlorinGhita What do you mean by that? I just tried the query, and to me it looks like it is working.Arnulfo
heeeei, you edited your question. order by min(k_str) is a different thing that order by k_strColcothar
@FlorinGhita Yes, the min(..) part was missing from my initial answer. Apologize for that.Arnulfo
F
7

How do extended sort key columns

The logical order of operations in SQL for your first query, is (simplified):

  • FROM abc_test
  • SELECT n_num, k_str i.e. add a so called extended sort key column
  • ORDER BY k_str DESC
  • SELECT n_num i.e. remove the extended sort key column again from the result.

Thanks to the SQL standard extended sort key column feature, it is possible to order by something that is not in the SELECT clause, because it is being temporarily added to it behind the scenes prior to ordering, and then removed again after ordering.

So, why doesn't this work with DISTINCT?

If we add the DISTINCT operation, it would need to be added between SELECT and ORDER BY:

  • FROM abc_test
  • SELECT n_num, k_str i.e. add a so called extended sort key column
  • DISTINCT
  • ORDER BY k_str DESC
  • SELECT n_num i.e. remove the extended sort key column again from the result.

But now, with the extended sort key column k_str, the semantics of the DISTINCT operation has been changed, so the result will no longer be the same. This is not what we want, so both the SQL standard, and all reasonable databases forbid this usage.

Workarounds

PostgreSQL has the DISTINCT ON syntax, which can be used here for precisely this job:

SELECT DISTINCT ON (k_str) n_num
FROM abc_test
ORDER BY k_str DESC

It can be emulated with standard syntax as follows, if you're not using PostgreSQL

SELECT n_num
FROM (
  SELECT n_num, MIN(k_str) AS k_str
  FROM abc_test
  GROUP BY n_num
) t
ORDER BY k_str

Or, just simply (in this case)

SELECT n_num, MIN(k_str) AS k_str
FROM abc_test
GROUP BY n_num
ORDER BY k_str

I have blogged about SQL DISTINCT and ORDER BY more in detail here.

Flow answered 20/7, 2018 at 8:34 Comment(0)
C
6

The first query is impossible. Lets explain this by example. we have this test:

n_num k_str
2     a
2     c
1     b

select distinct (n_num) from abc_test is

2
1

Select n_num from abc_test order by k_str is

2
1
2

What do you want to return

select distinct (n_num) from abc_test order by k_str?

it should return only 1 and 2, but how to order them?

Colcothar answered 18/1, 2012 at 8:7 Comment(6)
how about this? select distinct (n_num) as iresult from abc_test ORDER BY iresult ASCBarsac
Sure he can order by first column. The OP was about why this query won't work. What you say can be simply: select distinct n_num from abc_test order by 1Colcothar
"how to order them?" -- I can think of possible approaches: all arbitrarily; order the unambiguous values in order then the ambiguous ones in arbitrarily; etc. But I rather suspect you were making a point that is lost on me!Precipitant
If all values for n_num are distinct in the table, does the error still occur? If so, does this change your conclusions?Precipitant
it is not hard to understand: for 1 you have b. for 2 you have a an c. Wich is first? 1 or 2? :)Colcothar
@Precipitant yes. the query is technicaly wrong. It defies the theory or relation.Colcothar
W
0

You are selecting the collection distinct(n_num) from the resultset from your query. So there is no actual relation with the column k_str anymore. A n_num can be from two rows each having a different value for k_str. So you can't order the collection distinct(n_num) by k_str.

Wheatworm answered 18/1, 2012 at 8:4 Comment(0)
P
0

According to SQL Standards, a SELECT clause may refer either to as clauses ("aliases") in the top level SELECT clause or columns of the resultset by ordinal position, and therefore nether of your queries would be compliant.

It seems Oracle, in common with other SQL implemetations, allows you to refer to columns that existed (logically) immediately prior to being projected away in the SELECT clause. I'm not sure whether such flexibility is such a good thing: IMO it is good practice to expose the sort order to the calling application by including the column/expressions etc in the SELECT clause.

As ever, you need to apply dsicpline to get meaningful results. For your first query, the definition of order is potentially entirely arbitrary.You should be grateful for the error ;)

Precipitant answered 18/1, 2012 at 9:26 Comment(0)
D
0

This approach is available in SQL server 2000, you can select distinct values from a table and order by different column which is not included in Distinct. But in SQL 2012 this will through you an error "ORDER BY items must appear in the select list if SELECT DISTINCT is specified."

So, still if you want to use the same feature as of SQL 2000 you can use the column number for ordering(its not recommended in best practice).

select distinct(n_num) from abc_test order by 1

This will order the first column after fetching the result. If you want the ordering should be done based on different column other than distinct then you have to add that column also in select statement and use column number to order by.

select distinct(n_num), k_str from abc_test order by 2
Diannediannne answered 16/9, 2014 at 14:25 Comment(0)
S
0

When I got same error, I got it resolved by changing it as

SELECT n_num 
FROM(
   SELECT DISTINCT(n_num) AS n_num, k_str
   FROM abc_test
) as tbl
ORDER BY tbl.k_str
Schubert answered 13/7, 2017 at 6:8 Comment(1)
This cannot be right. Are you sure it doesnt result in duplicates?Carisacarissa
Q
0

My query doesn't match yours exactly, but it's pretty close.

select distinct a.character_01 , (select top 1 b.sort_order from LookupData b where a.character_01 = b.character_01 )
from LookupData a 
where 
Dataset_Name = 'Sample' and status = 200
order by 2, 1
Quebec answered 24/5, 2019 at 21:53 Comment(0)
B
-1

did you try this?

SELECT DISTINCT n_num as iResult
FROM abc_test 
ORDER BY iResult
Barsac answered 18/1, 2012 at 8:12 Comment(1)
see the example in my answer. What do you want to return your query?Colcothar
X
-1

you can do

select distinct top 10000 (n_num)  --assuming you won't have more than 10,000 rows                
from abc_test order by(k_str)
Xiaoximena answered 1/11, 2016 at 18:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.