Annotate QuerySet with first value of ordered related model
Asked Answered
P

2

15

I have a QuerySet of some objects. For each one, I wish to annotate with the minimum value of a related model (joined on a few conditions, ordered by date). I can express my desired results neatly in SQL, but am curious how to translate to Django's ORM.

Background

Let's say that I have two related models: Book, and BlogPost, each with a foreign key to an Author:

class Book(models.Model):
    title = models.CharField(max_length=255)
    genre = models.CharField(max_length=63)
    author = models.ForeignKey(Author)
    date_published = models.DateField()

class BlogPost(models.Model):
    author = models.ForeignKey(Author)
    date_published = models.DateField()

I'm trying to find the first mystery book that a given author published after each blog post that they write. In SQL, this can be achieved nicely with windowing.

Working solution in PostgreSQL 9.6

WITH ordered AS (
  SELECT blog_post.id,
         book.title,
         ROW_NUMBER() OVER (
            PARTITION BY blog_post.id ORDER BY book.date_published
         ) AS rn
    FROM blog_post
         LEFT JOIN book ON book.author_id = blog_post.author_id
                       AND book.genre = 'mystery'
                       AND book.date_published >= blog_post.date_published
)
SELECT id,
       title
  FROM ordered
 WHERE rn = 1;

Translating to Django's ORM

While the above SQL suits my needs well (and I could use raw SQL if needed), I'm curious as to how one would do this in QuerySet. I have an existing QuerySet where I'd like to annotate it even further

books = models.Book.objects.filter(...).select_related(...).prefetch_related(...)
annotated_books = books.annotate(
    most_recent_title=...
)

I'm aware that Django 2.0 supports window functions, but I'm on Django 1.10 for now.

Attempted solution

I'd first built a Q object to filter down to mystery books published after the blog post.

published_after = Q(
    author__book__date_published__gte=F('date_published'),
    author__book__genre='mystery'
)

From here, I attempted to piece together django.db.models.Min and additional F objects to acheive my desired results, but with no success.

Note: Django 2.0 introduces window expressions, but I'm currently on Django 1.10, and curious how one would do this with the QuerySet features available there.

Pinckney answered 4/1, 2018 at 2:47 Comment(2)
Honestly (and without being an expert at all), I don't think you can do this in a single query using Django's 1.10 ORM... Subqueries are introduced in Django 1.11 and (as you said) window expressions are introduced in 2.0 :-(Babcock
did you try annotate(first=Min(<related_model__id>)).filter(<conditions with timestamps>) ?Eteocles
H
4

Perhaps using .raw isn't such a bad idea. Checking the code for Window class we can see that essentially composes an SQL query to achieve the "Windowing".

An easy way out may be the usage of the architect module which can add partition functionality for PostgreSQL according to the documentation.

Another module that claims to inject Window functionality to Django < 2.0 is the django-query-builder which adds a partition_by() queryset method and can be used with order_by:

query = Query().from_table(
    Order,
    ['*', RowNumberField(
              'revenue', 
              over=QueryWindow().order_by('margin')
                                .partition_by('account_id')
          )
    ]
)
query.get_sql()
# SELECT tests_order.*, ROW_NUMBER() OVER (PARTITION BY account_id ORDER BY margin ASC) AS revenue_row_number
# FROM tests_order

Finally, you can always copy the Window class source code in your project or use this alternate Window class code.

Highpressure answered 11/1, 2018 at 9:29 Comment(0)
T
3

Your apparent problem is that Django 1.10 is too old to handle window functions properly (which have been around for a very long time already).

That problem goes away if you rewrite your query without window function.

3 equivalent queries

Which of them is fastest depends on available indexes and data distribution. But each of them should be faster than your original.

1. With DISTINCT ON:

SELECT DISTINCT ON (p.id)
       p.id, b.title
FROM   blog_post p
LEFT   JOIN book b ON b.author_id = p.author_id
                  AND b.genre = 'mystery'
                  AND b.date_published >= p.date_published
ORDER  BY p.id, b.date_published;

Related, with detailed explanation:

2. With a LATERAL subquery (requires Postgres 9.3 or later):

SELECT p.id, b.title
FROM   blog_post p
LEFT   JOIN LATERAL (
   SELECT title
   FROM   book 
   WHERE  author_id = p.author_id
   AND    genre = 'mystery'
   AND    date_published >= p.date_published
   ORDER  BY date_published
   LIMIT  1
   ) b ON true;
-- ORDER BY p.id  -- optional

Related, with detailed explanation:

3. Or simpler, yet, with a correlated subquery:

SELECT p.id
     ,(SELECT title
       FROM   book 
       WHERE  author_id = p.author_id
       AND    genre = 'mystery'
       AND    date_published >= p.date_published
       ORDER  BY date_published
       LIMIT  1)
FROM   blog_post p;
-- ORDER BY p.id  -- optional

Each should be translated easily to Django syntax. You might also just use the raw SQL, that's what is sent to the Postgres server anyway.

Tempered answered 11/1, 2018 at 16:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.