aggregate() vs annotate() in Django
Asked Answered
S

4

193

Django's QuerySet has two methods, annotate and aggregate. The documentation says that:

Unlike aggregate(), annotate() is not a terminal clause. The output of the annotate() clause is a QuerySet. https://docs.djangoproject.com/en/4.1/topics/db/aggregation/#generating-aggregates-for-each-item-in-a-queryset

Is there any other difference between them? If not, then why does aggregate exist?

Sines answered 2/11, 2011 at 14:18 Comment(2)
Can you please add the link to this statement in the docs?Tommy
added link to docsJehoshaphat
S
292

I would focus on the example queries rather than your quote from the documentation. Aggregate calculates values for the entire queryset. Annotate calculates summary values for each item in the queryset.

Aggregation

>>> Book.objects.aggregate(average_price=Avg('price'))
{'average_price': 34.35}

Returns a dictionary containing the average price of all books in the queryset.

Annotation

>>> q = Book.objects.annotate(num_authors=Count('authors'))
>>> q[0].num_authors
2
>>> q[1].num_authors
1

q is the queryset of books, but each book has been annotated with the number of authors.

Soapstone answered 2/11, 2011 at 17:20 Comment(2)
Am I correct that .annotate() on a qs alone does not hit the db, but calling q[0].num_authors does? I assume aggregate must always hit the db as it is a terminal clause?Procurer
@Procurer that's really related to the original question, so I don't think the comments on an eight-year-old question is the best place to ask. If you want to check when the queries run, then you can check connection.queries. Hint: check whether it's the book = q[0] or ` book.num_authors` that causes the query.Soapstone
A
44

Aggregate Aggregate generate result (summary) values over an entire QuerySet. Aggregate operate over the rowset to get a single value from the rowset.(For example sum of all prices in the rowset). Aggregate is applied on entire QuerySet and it generate result (summary) values over an entire QuerySet.

In Model:

class Books(models.Model):
    name = models.CharField(max_length=100)
    pages = models.IntegerField()
    price = models.DecimalField(max_digits=5, decimal_places=3)

In Shell:

>>> Books.objects.all().aggregate(Avg('price'))
# Above code will give the Average of the price Column 
>>> {'price__avg': 34.35}

Annotate Annotate generate an independent summary for each object in a QuerySet.(We can say it iterate each object in a QuerySet and apply operation)

In Model:

class Video(models.Model):
    name = models.CharField(max_length=52, verbose_name='Name')
    video = models.FileField(upload_to=document_path, verbose_name='Upload 
               video')
    created_by = models.ForeignKey(User, verbose_name='Created by', 
                       related_name="create_%(class)s")
    user_likes = models.ManyToManyField(UserProfile, null=True, 
                  blank=True, help_text='User can like once', 
                         verbose_name='Like by')

In View:

videos = Video.objects.values('id', 'name','video').annotate(Count('user_likes',distinct=True)

In view it will count the likes for each video

Armoury answered 31/8, 2017 at 14:20 Comment(2)
why distinct=True is required in last example?Tarriance
@YuriyLeonov distinct=True used for that the operation perform on distinct value. It's not related to the current question asked. Sorry for that Actually I have used on my code.Armoury
P
23

That's the main difference, but aggregates also work on a grander scale than annotations. Annotations are inherently related to individual items in a queryset. If you run an Count annotation on a something like a many-to-many field, you'll get a separate count for each member of the queryset (as an added attribute). If you were to do the same with an aggregation, however, it would attempt to count every relationship on every member of the queryset, even duplicates, and return that as just one value.

Pentothal answered 2/11, 2011 at 14:26 Comment(2)
Am I correct that .annotate() on a qs alone does not hit the db, but calling the result of an annotation such as q[0].num_authors does? I assume aggregate must always hit the db as it is a terminal clause?Procurer
@Procurer yes you are correctBrewage
P
7
  • aggregate() can calculates all values of the current model's column. *A dictionary is returned.

  • annotate() can calculates all values of the child model's column accessed by foreign key. *A QuerySet is returned.

*Avg(), Count(), Max(), Min(), Sum() and so on can be used with aggregate() and annotate().

For example, there are Category and Product models below:

# "models.py"

from django.db import models

class Category(models.Model):
    name = models.CharField(max_length=20)

class Product(models.Model):
    category = models.ForeignKey(Category, on_delete=models.CASCADE)
    name = models.CharField(max_length=50)
    price = models.DecimalField(decimal_places=2, max_digits=5)

And, there are Category and Product admins below:

# "admin.py"

from django.contrib import admin
from .models import Category, Product 

@admin.register(Category)
class CategoryAdmin(admin.ModelAdmin):
    list_display = ('id', 'name')
    ordering = ('id',)

@admin.register(Product)
class ProductAdmin(admin.ModelAdmin):
    list_display = ('id', 'category_id', 'category', 'name', 'price')
    ordering = ('id',)

And, there are 2 categories below:

enter image description here

And, there are 5 products below:

enter image description here

And, there is test view below:

# "views.py"

from .models import Category, Product
from django.http import HttpResponse
from django.db.models import Avg
from django.db.models import Count
from django.db.models import Max
from django.db.models import Min
from django.db.models import Sum

def test(request):
    return HttpResponse("Test")

First, I explain about aggregate().

aggregate():

Now, I run test view which has the columns id, category and price in Avg(), Count(), Max(), Min() and Sum() in aggregate() as shown below:

# "views.py"

# ...

def test(request):

    print(Product.objects.aggregate(Avg('id')))
    print(Product.objects.aggregate(Count('id')))
    print(Product.objects.aggregate(Max('id')))
    print(Product.objects.aggregate(Min('id')))
    print(Product.objects.aggregate(Sum('id')))
    print()
    print(Product.objects.aggregate(Avg('category')))
    print(Product.objects.aggregate(Count('category')))
    print(Product.objects.aggregate(Max('category')))
    print(Product.objects.aggregate(Min('category')))
    print(Product.objects.aggregate(Sum('category')))
    print()
    print(Product.objects.aggregate(Avg('price')))
    print(Product.objects.aggregate(Count('price')))
    print(Product.objects.aggregate(Max('price')))
    print(Product.objects.aggregate(Min('price')))
    print(Product.objects.aggregate(Sum('price')))
    
    return HttpResponse("Test")

Then, these dictionaries below are outputted on console:

{'id__avg': 3.0}
{'id__count': 5}
{'id__max': 5}
{'id__min': 1}
{'id__sum': 15}

{'category__avg': 1.4}
{'category__count': 5}
{'category__max': 2}
{'category__min': 1}
{'category__sum': 7}

{'price__avg': Decimal('30.0000000000000000')}
{'price__count': 5}
{'price__max': Decimal('50.00')}
{'price__min': Decimal('10.00')}
{'price__sum': Decimal('150.00')}

And, aggregate() can accept multiple kinds of columns and functions in any order, the multiple same kind of columns and functions and no columns and functions as shown below. *The multiple same kind of columns and functions are made into one and no columns and functions get an empty dictionary:

# "views.py"

# ...

def test(request):
    # Multiple kinds of columns and functions in any order
    print(
        Product.objects.aggregate(
            Max('price'), Max('category'), Sum('id'), Min('id')
        )
    )

    # The multiple same kind of columns and functions
    print(
        Product.objects.aggregate(
            Sum('price'), Sum('price'), Sum('price')
        )
    )

    # No columns and functions
    print(Product.objects.aggregate())
        
    return HttpResponse("Test")

Then, these dictionaries below are outputted on console:

{'price__max': Decimal('50.00'), 'category__max': 2, 'id__sum': 15, 'id__min': 1}
{'price__sum': Decimal('150.00')}
{}

And, Max() and Min() below can accept non-numeric types:

# "views.py"

# ...

def test(request):

    print(Product.objects.aggregate(Count('name')))
    print(Product.objects.aggregate(Max('name')))
    print(Product.objects.aggregate(Min('name')))
        
    return HttpResponse("Test")

Then, these dictionaries below are outputted on console:

{'name__count': 5}
{'name__max': 'Tea'}
{'name__min': 'Apple'}

But, Avg() and Sum() below cannot accept non-numeric types:

# "views.py"

# ...

def test(request):

    print(Product.objects.aggregate(Avg('name')))
    print(Product.objects.aggregate(Sum('name')))
        
    return HttpResponse("Test")

So, the errors below occur:

django.db.utils.ProgrammingError: function avg(character varying) does not exist

django.db.utils.ProgrammingError: function sum(character varying) does not exist

And, you can change the default key names as shown below:

# "views.py"

# ...

def test(request):

    print(Product.objects.aggregate(priceAve=Avg('price')))
    print(Product.objects.aggregate(priceCount=Count('price')))
    print(Product.objects.aggregate(priceMax=Max('price')))
    print(Product.objects.aggregate(priceMin=Min('price')))
    print(Product.objects.aggregate(priceSum=Sum('price')))
        
    return HttpResponse("Test")

Then, the default key names are changed as shown below:

{'priceAve': Decimal('30.0000000000000000')}
{'priceCount': 5}
{'priceMax': Decimal('50.00')}
{'priceMin': Decimal('10.00')}
{'priceSum': Decimal('150.00')}

Next, I explain about annotate().

annotate():

Now, I run test view which has the columns product__id, product__category and product__price in Avg(), Count(), Max(), Min() and Sum() in annotate() as shown below. *You need to put __avg, __count, __max, __min and __sum to product__id, product__category and product__price for Avg(), Count(), Max(), Min() and Sum() respectively:

# "views.py"

# ...

def test(request):

    qs = Category.objects.annotate(
        Avg('product__id'), 
        Count('product__id'), 
        Max('product__id'), 
        Min('product__id'), 
        Sum('product__id')
    ).order_by('pk')

    for obj in qs:
        print(
            obj.id, 
            obj.name, 
            obj.product__id__avg, 
            obj.product__id__count, 
            obj.product__id__max, 
            obj.product__id__min, 
            obj.product__id__sum
        )

    print()

    qs = Category.objects.annotate(
        Avg('product__category'), 
        Count('product__category'), 
        Max('product__category'), 
        Min('product__category'), 
        Sum('product__category')
    ).order_by('pk')

    for obj in qs:
        print(
            obj.id, 
            obj.name, 
            obj.product__category__avg, 
            obj.product__category__count, 
            obj.product__category__max, 
            obj.product__category__min, 
            obj.product__category__sum
        )

    print()

    qs = Category.objects.annotate(
        Avg('product__price'), 
        Count('product__price'), 
        Max('product__price'), 
        Min('product__price'), 
        Sum('product__price')
    ).order_by('pk')

    for obj in qs:
        print(
            obj.id, 
            obj.name, 
            obj.product__price__avg, 
            obj.product__price__count, 
            obj.product__price__max, 
            obj.product__price__min, 
            obj.product__price__sum
        )
    
    return HttpResponse("Test")

Then, these below are outputted on console:

1 Food 2.0 3 3 1 6
2 Drink 4.5 2 5 4 9

1 Food 1.0 3 1 1 3
2 Drink 2.0 2 2 2 4

1 Food 20.0000000000000000 3 30.00 10.00 60.00
2 Drink 45.0000000000000000 2 50.00 40.00 90.00

And, the query without order_by('pk') below makes the order descendant:

# "views.py"

# ...

def test(request):

    qs = Category.objects.annotate(
        Avg('product__price'), 
        Count('product__price'), 
        Max('product__price'), 
        Min('product__price'), 
        Sum('product__price')
    ) # Without ".order_by('pk')"

    for obj in qs:
        print(
            obj.id, 
            obj.name, 
            obj.product__price__avg, 
            obj.product__price__count, 
            obj.product__price__max, 
            obj.product__price__min, 
            obj.product__price__sum
        )
    
    return HttpResponse("Test")

Then, the order is descendant as shown below:

2 Drink 4.5 2 5 4 9
1 Food 2.0 3 3 1 6

And, empty annotate() below has id and name attributes:

# "views.py"

# ...

def test(request): 
                         # Empty "annotate()"
    qs = Category.objects.annotate().order_by('pk')

    for obj in qs:
        print(obj.id, obj.name)
        
    return HttpResponse("Test")

Then, these below are outputted on console:

1 Food
2 Drink

But, empty annotate() below doesn't have __avg, __count, __max, __min and __sum attributes as shown below:

# "views.py"

# ...

def test(request): 
                         # Empty "annotate()"
    qs = Category.objects.annotate().order_by('pk')
    
    for obj in qs:
        print(
            obj.product__price__avg,
            obj.product__price__count,
            obj.product__price__max,
            obj.product__price__min,
            obj.product__price__sum,
        )
        
    return HttpResponse("Test")

So, the errors below occur:

AttributeError: 'Category' object has no attribute 'product__price__avg'

AttributeError: 'Category' object has no attribute 'product__price__count'

AttributeError: 'Category' object has no attribute 'product__price__max'

AttributeError: 'Category' object has no attribute 'product__price__min'

AttributeError: 'Category' object has no attribute 'product__price__sum'

And, Max() and Min() below can accept non-numeric types:

# "views.py"

# ...

def test(request):

    qs = Category.objects.annotate(
        Count('product__name'), 
        Max('product__name'), 
        Min('product__name'), 
    ).order_by('pk')

    for obj in qs:
        print( 
            obj.product__name__count, 
            obj.product__name__max, 
            obj.product__name__min, 
        )

    return HttpResponse("Test")

Then, these below are outputted on console:

3 Orange Apple
2 Tea Milk

But, Avg() and Sum() below cannot accept non-numeric types:

# "views.py"

# ...

def test(request):

    qs = Category.objects.annotate(
        Avg('product__name'),
        Sum('product__name')
    ).order_by('pk')

    for obj in qs:
        print( 
            obj.product__name__avg, 
            obj.product__name__sum
        )
        
    return HttpResponse("Test")

So, the errors below occur:

django.db.utils.ProgrammingError: function avg(character varying) does not exist

django.db.utils.ProgrammingError: function sum(character varying) does not exist

And, you can change the default attribute names as shown below:

# "views.py"

# ...

def test(request):

    qs = Category.objects.annotate(
        productPriceAvg=Avg('product__price'), 
        productPriceCount=Count('product__price'), 
        productPriceMax=Max('product__price'), 
        productPriceMin=Min('product__price'), 
        productPriceSum=Sum('product__price')
    ).order_by('pk')

    for obj in qs:
        print(
            obj.id, 
            obj.name, 
            obj.productPriceAvg, 
            obj.productPriceCount, 
            obj.productPriceMax, 
            obj.productPriceMin, 
            obj.productPriceSum
        )
    
    return HttpResponse("Test")

Then, these below are outputted on console:

1 Food 20.0000000000000000 3 30.00 10.00 60.00
2 Drink 45.0000000000000000 2 50.00 40.00 90.00
Paraglider answered 24/1, 2023 at 10:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.