I have a not-so-complex (imho) filtering logic based on several conditions in my Django models. There is one particular query which takes an inusual amount of time to finish.
The query is built based on those two querysets:
queryset = self.serializer_class.Meta.model.valid_pricelist_objects.filter(
Q(drug_prices__pricelist__price_destination__to_all_insurances=True
) |
# pylint: disable=line-too-long
Q(
drug_prices__pricelist__price_destination__to_organization_data__organization__uuid
=self.kwargs.get('i_uuid')))
return queryset
and
return super().get_queryset().filter(
Q(active=True),
Q(drug_prices__pricelist__active=True), # Lista de precios activa
# Q(drug_pictures__is_main=True), # Que tenga una imagen
# TODO: Hacer filtros por pais PriceListCountries
Q(
Q(drug_prices__pricelist__expires=False) | # Que tenga precios que no caducan o
Q(
Q(drug_prices__pricelist__expires=True), # Que tenga precios que si caducan Y
Q(drug_prices__pricelist__datestart__date__lte=timezone.now()), # Fecha de inicio menor que hoy Y
Q(drug_prices__pricelist__dateend__date__gte=timezone.now()) # Fecha final mayor que hoy
)
)
).distinct().prefetch_related(
'categories__contexts',
'brands',
'drug_prices__pricelist',
'drug_pictures',
'drug_prices__pricelist__price_destination',
)
The second querysets wraps the first one (via the super()
call). The resulting query looks like this:
SELECT DISTINCT
"phdrug_phdrug"."id",
"phdrug_phdrug"."uuid",
"phdrug_phdrug"."default_description",
"phdrug_phdrug"."ean",
"phdrug_phdrug"."parent_ean",
"phdrug_phdrug"."reg_num",
"phdrug_phdrug"."atc_iv",
"phdrug_phdrug"."product_type",
"phdrug_phdrug"."fraction",
"phdrug_phdrug"."active",
"phdrug_phdrug"."loyal",
"phdrug_phdrug"."patent",
"phdrug_phdrug"."chronics",
"phdrug_phdrug"."recipe",
"phdrug_phdrug"."deal",
"phdrug_phdrug"."specialized",
"phdrug_phdrug"."armored",
"phdrug_phdrug"."hight_speciality",
"phdrug_phdrug"."temp_8_15",
"phdrug_phdrug"."temp_15_25",
"phdrug_phdrug"."temp_2_8",
"phdrug_phdrug"."temp_less_15",
"phdrug_phdrug"."new",
"phdrug_phdrug"."mdk_internal_code",
"phdrug_phdrug"."mdk_single_id",
"phdrug_phdrug"."is_from_mdk_db",
"phdrug_phdrug"."top",
"phdrug_phdrug"."laboratory_id",
"phdrug_phdrug"."specialty_id"
FROM
"phdrug_phdrug"
INNER JOIN "monetary_drugprice" ON ( "phdrug_phdrug"."id" = "monetary_drugprice"."drug_id" )
INNER JOIN "monetary_pricelist" ON ( "monetary_drugprice"."pricelist_id" = "monetary_pricelist"."id" )
INNER JOIN "monetary_drugprice" T4 ON ( "phdrug_phdrug"."id" = T4."drug_id" )
INNER JOIN "monetary_pricelist" T5 ON ( T4."pricelist_id" = T5."id" )
INNER JOIN "monetary_pricelistdestinations" ON ( T5."id" = "monetary_pricelistdestinations"."pricelist_id" )
LEFT OUTER JOIN "organization_organizationdata" ON ( "monetary_pricelistdestinations"."to_organization_data_id" = "organization_organizationdata"."id" )
LEFT OUTER JOIN "organization_organization" ON ( "organization_organizationdata"."organization_id" = "organization_organization"."id" )
WHERE
(
"phdrug_phdrug"."active" = TRUE
AND "monetary_pricelist"."active" = TRUE
AND (
"monetary_pricelist"."expires" = FALSE
OR (
"monetary_pricelist"."expires" = TRUE
AND ( "monetary_pricelist"."datestart" AT TIME ZONE'UTC' ) :: DATE <= '2019-01-22'
AND ( "monetary_pricelist"."dateend" AT TIME ZONE'UTC' ) :: DATE >= '2019-01-22'
))
AND (
"monetary_pricelistdestinations"."to_all_insurances" = TRUE
OR "organization_organization"."uuid" = 'b51773d4-05f8-43a2-86ef-0098b31725d8'
))
ORDER BY
"phdrug_phdrug"."default_description" ASC
Running the query with EXPLAIN ANALYZE
I get this:
Unique (cost=10412.31..12666.32 rows=29084 width=143) (actual time=3373.496..3620.090 rows=6442 loops=1)
-> Sort (cost=10412.31..10485.02 rows=29084 width=143) (actual time=3373.494..3460.790 rows=228667 loops=1)
Sort Key: phdrug_phdrug.default_description, phdrug_phdrug.id, phdrug_phdrug.uuid, phdrug_phdrug.ean, phdrug_phdrug.parent_ean, phdrug_phdrug.reg_num, phdrug_phdrug.medika_code, phdrug_phdrug.atc_iv, phdrug_phdrug.product_type, phdrug_phdrug.fraction, phdrug_phdrug.active, phdrug_phdrug.loyal, phdrug_phdrug.patent, phdrug_phdrug.chronics, phdrug_phdrug.recipe, phdrug_phdrug.deal, phdrug_phdrug.specialized, phdrug_phdrug.armored, phdrug_phdrug.hight_speciality, phdrug_phdrug.temp_8_15, phdrug_phdrug.temp_15_25, phdrug_phdrug.temp_2_8, phdrug_phdrug.temp_less_15, phdrug_phdrug.new, phdrug_phdrug.mdk_internal_code, phdrug_phdrug.mdk_single_id, phdrug_phdrug.is_from_mdk_db, phdrug_phdrug.top, phdrug_phdrug.laboratory_id, phdrug_phdrug.specialty_id
Sort Method: external merge Disk: 31192kB
-> Hash Join (cost=704.51..6166.54 rows=29084 width=143) (actual time=23.648..507.099 rows=228667 loops=1)
Hash Cond: (monetary_drugprice.pricelist_id = monetary_pricelist.id)
-> Nested Loop (cost=696.92..5604.95 rows=44105 width=147) (actual time=22.881..416.630 rows=457692 loops=1)
Join Filter: (phdrug_phdrug.id = monetary_drugprice.drug_id)
-> Hash Join (cost=696.51..1177.21 rows=4583 width=147) (actual time=22.864..38.841 rows=23577 loops=1)
Hash Cond: (phdrug_phdrug.id = t4.drug_id)
-> Seq Scan on phdrug_phdrug (cost=0.00..359.94 rows=11992 width=143) (actual time=0.438..3.593 rows=11992 loops=1)
Filter: active
Rows Removed by Filter: 2
-> Hash (cost=639.21..639.21 rows=4584 width=4) (actual time=22.339..22.339 rows=23577 loops=1)
Buckets: 32768 (originally 8192) Batches: 1 (originally 1) Memory Usage: 1085kB
-> Nested Loop (cost=3.99..639.21 rows=4584 width=4) (actual time=1.785..16.702 rows=23577 loops=1)
-> Nested Loop (cost=3.58..9.11 rows=5 width=8) (actual time=1.756..1.874 rows=7 loops=1)
-> Hash Left Join (cost=3.43..7.57 rows=5 width=4) (actual time=1.733..1.797 rows=7 loops=1)
Hash Cond: (monetary_pricelistdestinations.to_organization_data_id = organization_organizationdata.id)
Filter: (monetary_pricelistdestinations.to_all_insurances OR (organization_organization.uuid = 'b51773d4-05f8-43a2-86ef-0098b31725d8'::uuid))
Rows Removed by Filter: 130
-> Seq Scan on monetary_pricelistdestinations (cost=0.00..3.37 rows=137 width=9) (actual time=0.626..0.643 rows=137 loops=1)
-> Hash (cost=3.12..3.12 rows=25 width=20) (actual time=1.076..1.076 rows=25 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Hash Left Join (cost=1.56..3.12 rows=25 width=20) (actual time=1.040..1.053 rows=25 loops=1)
Hash Cond: (organization_organizationdata.organization_id = organization_organization.id)
-> Seq Scan on organization_organizationdata (cost=0.00..1.25 rows=25 width=8) (actual time=0.501..0.504 rows=25 loops=1)
-> Hash (cost=1.25..1.25 rows=25 width=20) (actual time=0.513..0.513 rows=25 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Seq Scan on organization_organization (cost=0.00..1.25 rows=25 width=20) (actual time=0.484..0.501 rows=25 loops=1)
-> Index Only Scan using monetary_pricelist_pkey on monetary_pricelist t5 (cost=0.14..0.31 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=7)
Index Cond: (id = monetary_pricelistdestinations.pricelist_id)
Heap Fetches: 7
-> Index Scan using monetary_drugprice_pricelist_id_1ce160ce on monetary_drugprice t4 (cost=0.42..110.21 rows=1581 width=8) (actual time=0.010..1.236 rows=3368 loops=7)
Index Cond: (pricelist_id = t5.id)
-> Index Scan using monetary_drugprice_drug_id_c2f278e5 on monetary_drugprice (cost=0.42..0.78 rows=15 width=8) (actual time=0.002..0.009 rows=19 loops=23577)
Index Cond: (drug_id = t4.drug_id)
-> Hash (cost=6.45..6.45 rows=91 width=4) (actual time=0.757..0.757 rows=93 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Seq Scan on monetary_pricelist (cost=0.00..6.45 rows=91 width=4) (actual time=0.655..0.713 rows=93 loops=1)
Filter: (active AND ((NOT expires) OR (expires AND ((timezone('UTC'::text, datestart))::date <= '2019-01-22'::date) AND ((timezone('UTC'::text, dateend))::date >= '2019-01-22'::date))))
Rows Removed by Filter: 45
Planning time: 25.871 ms
Execution time: 3638.544 ms
If I paste this in explain.depesz.com
, I get this result
Almost the entire time that the query takes to finish is spent sorting, as you can see here:
...and this is where I'm completely lost. What is it sorting? It's not the ORDER BY
at the end of my query, I already tried removing that.
How can I improve the performance of this query?
EDIT: I'm adding the EXPLAIN ANALYZE
after removing the ORDER BY
. The query takes longer to finish, consistently. Which is adding another layer of wtf
to the problem.
Unique (cost=10412.31..12666.32 rows=29084 width=143) (actual time=4229.508..4427.560 rows=6442 loops=1)
-> Sort (cost=10412.31..10485.02 rows=29084 width=143) (actual time=4229.506..4274.698 rows=228667 loops=1)
Sort Key: phdrug_phdrug.id, phdrug_phdrug.uuid, phdrug_phdrug.default_description, phdrug_phdrug.ean, phdrug_phdrug.parent_ean, phdrug_phdrug.reg_num, phdrug_phdrug.medika_code, phdrug_phdrug.atc_iv, phdrug_phdrug.product_type, phdrug_phdrug.fraction, phdrug_phdrug.active, phdrug_phdrug.loyal, phdrug_phdrug.patent, phdrug_phdrug.chronics, phdrug_phdrug.recipe, phdrug_phdrug.deal, phdrug_phdrug.specialized, phdrug_phdrug.armored, phdrug_phdrug.hight_speciality, phdrug_phdrug.temp_8_15, phdrug_phdrug.temp_15_25, phdrug_phdrug.temp_2_8, phdrug_phdrug.temp_less_15, phdrug_phdrug.new, phdrug_phdrug.mdk_internal_code, phdrug_phdrug.mdk_single_id, phdrug_phdrug.is_from_mdk_db, phdrug_phdrug.top, phdrug_phdrug.laboratory_id, phdrug_phdrug.specialty_id
Sort Method: external merge Disk: 31160kB
-> Hash Join (cost=704.51..6166.54 rows=29084 width=143) (actual time=21.814..605.830 rows=228667 loops=1)
Hash Cond: (monetary_drugprice.pricelist_id = monetary_pricelist.id)
-> Nested Loop (cost=696.92..5604.95 rows=44105 width=147) (actual time=21.195..501.337 rows=457692 loops=1)
Join Filter: (phdrug_phdrug.id = monetary_drugprice.drug_id)
-> Hash Join (cost=696.51..1177.21 rows=4583 width=147) (actual time=21.180..43.205 rows=23577 loops=1)
Hash Cond: (phdrug_phdrug.id = t4.drug_id)
-> Seq Scan on phdrug_phdrug (cost=0.00..359.94 rows=11992 width=143) (actual time=0.491..5.225 rows=11992 loops=1)
Filter: active
Rows Removed by Filter: 2
-> Hash (cost=639.21..639.21 rows=4584 width=4) (actual time=20.589..20.589 rows=23577 loops=1)
Buckets: 32768 (originally 8192) Batches: 1 (originally 1) Memory Usage: 1085kB
-> Nested Loop (cost=3.99..639.21 rows=4584 width=4) (actual time=1.252..15.098 rows=23577 loops=1)
-> Nested Loop (cost=3.58..9.11 rows=5 width=8) (actual time=1.182..1.292 rows=7 loops=1)
-> Hash Left Join (cost=3.43..7.57 rows=5 width=4) (actual time=1.164..1.226 rows=7 loops=1)
Hash Cond: (monetary_pricelistdestinations.to_organization_data_id = organization_organizationdata.id)
Filter: (monetary_pricelistdestinations.to_all_insurances OR (organization_organization.uuid = 'b51773d4-05f8-43a2-86ef-0098b31725d8'::uuid))
Rows Removed by Filter: 130
-> Seq Scan on monetary_pricelistdestinations (cost=0.00..3.37 rows=137 width=9) (actual time=0.347..0.364 rows=137 loops=1)
-> Hash (cost=3.12..3.12 rows=25 width=20) (actual time=0.794..0.794 rows=25 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Hash Left Join (cost=1.56..3.12 rows=25 width=20) (actual time=0.774..0.786 rows=25 loops=1)
Hash Cond: (organization_organizationdata.organization_id = organization_organization.id)
-> Seq Scan on organization_organizationdata (cost=0.00..1.25 rows=25 width=8) (actual time=0.317..0.319 rows=25 loops=1)
-> Hash (cost=1.25..1.25 rows=25 width=20) (actual time=0.432..0.432 rows=25 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Seq Scan on organization_organization (cost=0.00..1.25 rows=25 width=20) (actual time=0.411..0.415 rows=25 loops=1)
-> Index Only Scan using monetary_pricelist_pkey on monetary_pricelist t5 (cost=0.14..0.31 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=7)
Index Cond: (id = monetary_pricelistdestinations.pricelist_id)
Heap Fetches: 7
-> Index Scan using monetary_drugprice_pricelist_id_1ce160ce on monetary_drugprice t4 (cost=0.42..110.21 rows=1581 width=8) (actual time=0.012..1.127 rows=3368 loops=7)
Index Cond: (pricelist_id = t5.id)
-> Index Scan using monetary_drugprice_drug_id_c2f278e5 on monetary_drugprice (cost=0.42..0.78 rows=15 width=8) (actual time=0.002..0.012 rows=19 loops=23577)
Index Cond: (drug_id = t4.drug_id)
-> Hash (cost=6.45..6.45 rows=91 width=4) (actual time=0.609..0.609 rows=93 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Seq Scan on monetary_pricelist (cost=0.00..6.45 rows=91 width=4) (actual time=0.539..0.582 rows=93 loops=1)
Filter: (active AND ((NOT expires) OR (expires AND ((timezone('UTC'::text, datestart))::date <= '2019-01-22'::date) AND ((timezone('UTC'::text, dateend))::date >= '2019-01-22'::date))))
Rows Removed by Filter: 45
Planning time: 25.288 ms
Execution time: 4440.406 ms
EXPLAIN ANALYZE
output of the query withoutORDER BY
looks like? You said that you remove it from the query, can you show that? – Selfgoverned