SQL - Subquery in Aggregate Function
Asked Answered
P

4

15

I'm using the northwind database to refresh my SQL skills by creating some more or less complex queries. Unfortunately I could not find a solution for my last use case: "Get the sum of the five greatest orders for every category in year 1997."

The tables involved are:

Orders(OrderId, OrderDate)
Order Details(OrderId, ProductId, Quantity, UnitPrice)
Products(ProductId, CategoryId)
Categories(CategoryId, CategoryName)

I have tried the following query

SELECT c.CategoryName, SUM(
  (SELECT TOP 5 od2.UnitPrice*od2.Quantity 
   FROM [Order Details] od2, Products p2
   WHERE od2.ProductID = p2.ProductID
   AND c.CategoryID = p2.CategoryID
   ORDER BY 1 DESC))
FROM [Order Details] od, Products p, Categories c, Orders o 
WHERE od.ProductID = p. ProductID
AND p.CategoryID = c.CategoryID
AND od.OrderID = o.OrderID
AND YEAR(o.OrderDate) = 1997
GROUP BY c.CategoryName

Well... It turned out that subqueries are not allowed in aggregate functions. I've read other posts about this issue but could not find a solution for my specific use case. Hope you can help me out...

Pulpboard answered 1/5, 2013 at 12:7 Comment(2)
Which version of RDBMS are you using? Also your question is not correct because the order can belong in more than one category.Cypriot
I'm using MSSQL 2012. If you refer to the subquery, there is a WHERE clause - WHERE c.categoryID = p2.CategoryID - that should filter for one category only.Pulpboard
P
35

Subqueries are not generally allowed in aggregate functions. Instead, move the aggregate inside the subquery. In this case, you'll need an extra level of subquery because of the top 5:

SELECT c.CategoryName,
  (select sum(val)
   from (SELECT TOP 5 od2.UnitPrice*od2.Quantity as val
         FROM [Order Details] od2, Products p2
         WHERE od2.ProductID = p2.ProductID
         AND c.CategoryID = p2.CategoryID
         ORDER BY 1 DESC
        ) t
  )
FROM [Order Details] od, Products p, Categories c, Orders o 
WHERE od.ProductID = p. ProductID
AND p.CategoryID = c.CategoryID
AND od.OrderID = o.OrderID
AND YEAR(o.OrderDate) = 1997
GROUP BY c.CategoryName, c.CategoryId
Pearse answered 1/5, 2013 at 13:28 Comment(1)
Thanks that did the job! Just for info... the CategoryID needs to be grouped as well. The last line should look like GROUP BY c.CategoryName, c.CategoryID. But thanks for your time, that query looks really weird to me. Need to analyze it in depth now... :)Pulpboard
D
4

Use CTE with ROW_NUMBER ranking function instead of excessive subquery.

 ;WITH cte AS
 (
  SELECT c.CategoryName, od2.UnitPrice, od2.Quantity,
         ROW_NUMBER() OVER(PARTITION BY c.CategoryName ORDER BY od2.UnitPrice * od2.Quantity DESC) AS rn
  FROM [Order Details] od JOIN Products p ON od.ProductID = p.ProductID
                          JOIN Categories c ON p.CategoryID = c.CategoryID
                          JOIN Orders o ON od.OrderID = o.OrderID
  WHERE o.OrderDate >= DATEADD(YEAR, DATEDIFF(YEAR, 0, '19970101'), 0)
    AND o.OrderDate < DATEADD(YEAR, DATEDIFF(YEAR, 0, '19970101')+1, 0)
  )
  SELECT CategoryName, SUM(UnitPrice * Quantity) AS val
  FROM cte
  WHERE rn < 6
  GROUP BY CategoryName
Demott answered 1/5, 2013 at 14:23 Comment(0)
Z
3

Its definitely a sub query problem here is an excellent article on this (originally written for Access but the syntax is identical), also orderdate = 1997 will give order date for 1 jan 1997' -- you need datepart(year, orderdate) = 1997, once you have the (up to five) rows returned for each category you can then encapsulate the rows returned and agregate them

Zeal answered 1/5, 2013 at 12:16 Comment(4)
Thanks for the link. You're right I edited my post. It should be ofc YEAR(o.OrderDate) = '1997'Pulpboard
Could you provide an example for "encapsulate the rows returned and aggregate them"?Pulpboard
SELECT x.A, x.B, x.C, SUM(x.d) AS D FROM ( Any valid sql select statement containing columns a, b , c, d, e) x GROUP BY x.dZeal
This is also a sub query, however its a very simple one and the top 5 grouped by Category subquery is a more advanced and powerful use. I did not want to confuse by using the same descriptive nown. More important for you to be aware of the full power of sub queries.Zeal
J
0

I ran into a very similar problem with an Access subquery where the records were sorted by date. When I used the "Last" aggregate function I found it passed through all of the subqueries and retrieved the last row of data from the Access table, and not the sorted query as intended. Although I could have rewritten the query to use the aggregate function within the first set of parenthesis (as was previously suggested) I found it easier to save query results as a table in the database sorted in the order I wanted and then use the "Last" aggregate function to retrieve the values I wanted. I'll run an update query in the future to keep results current. Not efficient but effective.

Jaborandi answered 10/4, 2015 at 20:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.