I've got a django model with a lot of fields. I'm trying in a single query to get the avg value of a given field and the average value of the top 5 values of that same field (from my other question regarding pure SQL: Average of top 5 value in a table for a given group by). Not that it should matters but: my database is redshift.
I've found two different ways to achieve this in SQL, but I'm having trouble implementing those queries using django ORM
Here is an example of what I want to do using Cars:
class Cars(models.Model):
manufacturer = models.CharField()
model = models.CharField()
price = models.FloatField()
Data:
manufacturer | model | price
Citroen C1 1
Citroen C2 2
Citroen C3 3
Citroen C4 4
Citroen C5 5
Citroen C6 6
Ford F1 7
Ford F2 8
Ford F3 9
Ford F4 10
Ford F5 11
Ford F6 12
Ford F6 19
GenMotor G1 20
GenMotor G3 25
GenMotor G4 22
Expected Output:
manufacturer | average_price | average_top_5_price
Citroen 3.5 4.0
Ford 10.85 12.2
GenMotor 22.33 22.33
Here are two pure SQL queries achieving the desired effect:
SELECT
main.manufacturer,
AVG(main.price) AS average_price,
AVG(CASE WHEN rank <= 5 THEN main.price END) AS average_top_5_price
FROM (
SELECT
manufacturer,
price,
ROW_NUMBER() OVER (PARTITION BY manufacturer ORDER BY price DESC) AS rank
FROM
cars
) main
GROUP BY
main.manufacturer;
And the second way of doing it:
SELECT A.manufacturer, A.avg_price, B.top5_price
FROM (
SELECT manufacturer, AVG(price) as avg_price
FROM cars
GROUP BY manufacturer
) A
JOIN (
SELECT manufacturer, AVG(psv_99) as top5_price
FROM (
SELECT manufacturer, price, RANK()
OVER (PARTITION BY manufacturer ORDER BY price DESC, id)
FROM cars
)
WHERE rank <= 5
GROUP BY manufacturer
) B
ON A.manufacturer = B.manufacturer
ORDER BY manufacturer
So far I haven't manage to implement either of those queries using django ORM, for the first one I can't find a way to have django do a "select from subquery" for the second one I can't find a good way to force django to "join two subquery"
PS: Keep in mind that I have reduced my table to three fields to simplify solving that particular problem, but I have ~100 column in my real table on which I'm making different calculation in the same queries.
django-cte
that solves exactly your problem, but it supports only Postgres. – Oden