Improve PostgresSQL aggregation query performance

Asked 20/1, 2020 at 7:30 Answered 3/2, 2020 at 6:48

sql postgresql indexing query-performance postgresql-performance

I am aggregating data from a Postgres table, the query is taking approx 2 seconds which I want to reduce to less than a second.

Please find below the execution details:

Query

select
    a.search_keyword,
    hll_cardinality( hll_union_agg(a.users) ):: int as user_count,
    hll_cardinality( hll_union_agg(a.sessions) ):: int as session_count,
    sum(a.total) as keyword_count
from
    rollup_day a
where
    a.created_date between '2018-09-01' and '2019-09-30'
    and a.tenant_id = '62850a62-19ac-477d-9cd7-837f3d716885'
group by
    a.search_keyword
order by
    session_count desc
limit 100;

Table metadata

Total number of rows - 506527
Composite Index on columns : tenant_id and created_date

Query plan

Custom Scan (cost=0.00..0.00 rows=0 width=0) (actual time=1722.685..1722.694 rows=100 loops=1)
  Task Count: 1
  Tasks Shown: All
  ->  Task
        Node: host=localhost port=5454 dbname=postgres
        ->  Limit  (cost=64250.24..64250.49 rows=100 width=42) (actual time=1783.087..1783.106 rows=100 loops=1)
              ->  Sort  (cost=64250.24..64558.81 rows=123430 width=42) (actual time=1783.085..1783.093 rows=100 loops=1)
                    Sort Key: ((hll_cardinality(hll_union_agg(sessions)))::integer) DESC
                    Sort Method: top-N heapsort  Memory: 33kB
                    ->  GroupAggregate  (cost=52933.89..59532.83 rows=123430 width=42) (actual time=905.502..1724.363 rows=212633 loops=1)
                          Group Key: search_keyword
                          ->  Sort  (cost=52933.89..53636.53 rows=281055 width=54) (actual time=905.483..1351.212 rows=280981 loops=1)
                                Sort Key: search_keyword
                                Sort Method: external merge  Disk: 18496kB
                                ->  Seq Scan on rollup_day a  (cost=0.00..17890.22 rows=281055 width=54) (actual time=29.720..112.161 rows=280981 loops=1)
                                      Filter: ((created_date >= '2018-09-01'::date) AND (created_date <= '2019-09-30'::date) AND (tenant_id = '62850a62-19ac-477d-9cd7-837f3d716885'::uuid))
                                      Rows Removed by Filter: 225546
            Planning Time: 0.129 ms
            Execution Time: 1786.222 ms
Planning Time: 0.103 ms
Execution Time: 1722.718 ms

What I've tried

I've tried with indexes on tenant_id and created_date but as the data is huge so it's always doing sequence scan rather than an index scan for filters. I've read about it and found, the Postgres query engine switch to sequence scan if the data returned is > 5-10% of the total rows. Please follow the link for more reference.
I've increased the work_mem to 100MB but it only improved the performance a little bit.

Any help would be really appreciated.

Update

Query plan after setting work_mem to 100MB

Custom Scan (cost=0.00..0.00 rows=0 width=0) (actual time=1375.926..1375.935 rows=100 loops=1)
  Task Count: 1
  Tasks Shown: All
  ->  Task
        Node: host=localhost port=5454 dbname=postgres
        ->  Limit  (cost=48348.85..48349.10 rows=100 width=42) (actual time=1307.072..1307.093 rows=100 loops=1)
              ->  Sort  (cost=48348.85..48633.55 rows=113880 width=42) (actual time=1307.071..1307.080 rows=100 loops=1)
                    Sort Key: (sum(total)) DESC
                    Sort Method: top-N heapsort  Memory: 35kB
                    ->  GroupAggregate  (cost=38285.79..43996.44 rows=113880 width=42) (actual time=941.504..1261.177 rows=172945 loops=1)
                          Group Key: search_keyword
                          ->  Sort  (cost=38285.79..38858.52 rows=229092 width=54) (actual time=941.484..963.061 rows=227261 loops=1)
                                Sort Key: search_keyword
                                Sort Method: quicksort  Memory: 32982kB
                                ->  Seq Scan on rollup_day_104290 a  (cost=0.00..17890.22 rows=229092 width=54) (actual time=38.803..104.350 rows=227261 loops=1)
                                      Filter: ((created_date >= '2019-01-01'::date) AND (created_date <= '2019-12-30'::date) AND (tenant_id = '62850a62-19ac-477d-9cd7-837f3d716885'::uuid))
                                      Rows Removed by Filter: 279266
            Planning Time: 0.131 ms
            Execution Time: 1308.814 ms
Planning Time: 0.112 ms
Execution Time: 1375.961 ms

Update 2

After creating an index on created_date and increased work_mem to 120MB

create index date_idx on rollup_day(created_date);

The total number of rows is: 12,124,608

Query Plan is:

Custom Scan (cost=0.00..0.00 rows=0 width=0) (actual time=2635.530..2635.540 rows=100 loops=1)
  Task Count: 1
  Tasks Shown: All
  ->  Task
        Node: host=localhost port=9702 dbname=postgres
        ->  Limit  (cost=73545.19..73545.44 rows=100 width=51) (actual time=2755.849..2755.873 rows=100 loops=1)
              ->  Sort  (cost=73545.19..73911.25 rows=146424 width=51) (actual time=2755.847..2755.858 rows=100 loops=1)
                    Sort Key: (sum(total)) DESC
                    Sort Method: top-N heapsort  Memory: 35kB
                    ->  GroupAggregate  (cost=59173.97..67948.97 rows=146424 width=51) (actual time=2014.260..2670.732 rows=296537 loops=1)
                          Group Key: search_keyword
                          ->  Sort  (cost=59173.97..60196.85 rows=409152 width=55) (actual time=2013.885..2064.775 rows=410618 loops=1)
                                Sort Key: search_keyword
                                Sort Method: quicksort  Memory: 61381kB
                                ->  Index Scan using date_idx_102913 on rollup_day_102913 a  (cost=0.42..21036.35 rows=409152 width=55) (actual time=0.026..183.370 rows=410618 loops=1)
                                      Index Cond: ((created_date >= '2018-01-01'::date) AND (created_date <= '2018-12-31'::date))
                                      Filter: (tenant_id = '12850a62-19ac-477d-9cd7-837f3d716885'::uuid)
            Planning Time: 0.135 ms
            Execution Time: 2760.667 ms
Planning Time: 0.090 ms
Execution Time: 2635.568 ms

Kipp answered 20/1, 2020 at 7:30 Comment(14)

This "Sort Method: external merge Disk: 18496kB" takes the majority of the time. You probably need to increase work_mem way beyond 100MB until that goes away. – Adieu 20/1, 2020 at 7:33

@a_horse_with_no_name, thanks for your reply. This is only taking 18MB of memory and my work_mem is 64 MB. Why it is still using disk for sort operation. – Kipp 20/1, 2020 at 7:35

The size on disk is way smaller than the size in memory (the disk operations are optimized for small size to make them at least somewhat acceptable in performance). The memory needed for an in-memory sort is typically much bigger than that. Maybe hll_union_agg needs that much memory. – Adieu 20/1, 2020 at 7:36

@a_horse_with_no_name, appreciated, I've 4-core 16GB EC2 machine. Could you please suggest some benchmarks for this system? – Kipp 20/1, 2020 at 7:38

Means how much work_mem should I need for 4-core 16GB EC2 machine. – Kipp 20/1, 2020 at 7:38

The amount of memory it needs to sort in memory does not depend on the number of cores. It also doesn't depend on the amount of RAM available, though obviously setting work_mem too high with too low of RAM is a poor idea for normal operations. – Terce 20/1, 2020 at 8:56

@jjanes, please check I've updated the question. – Kipp 20/1, 2020 at 10:18

Over 850ms (963-104) to sort 227261 rows of 54 bytes each in memory seems quite slow. My crummy laptop is about 3 times faster. What version of PostgreSQL is this, and what is the EC2 machine type? – Terce 20/1, 2020 at 21:25

LOL @Terce , I am using postgres 11.6 on t3a.xlarge (4 Core 16GB) – Kipp 21/1, 2020 at 12:13

t3a is a burstable instance. Perhaps you have exceeded your CPU credits and are now being throttled. Also I think that 'a' version is slower per core. – Terce 21/1, 2020 at 13:28

@Terce I checked the resource utilization on peak load and it's not consuming more than 50%. On peek load RAM utilization is 9GB/16GB only. Maybe I am wrong but I don't think if I upgrade the CPU it will impact the performance. – Kipp 22/1, 2020 at 4:27

Are you seeing 50% of a single CPU, or 50% of all your CPU? – Terce 22/1, 2020 at 15:53

less than 50% of all CPU. – Kipp 22/1, 2020 at 17:7

Since it is not parallelizable, less than 50% of all CPU is to be expected. – Terce 24/1, 2020 at 16:31

You should experiment with higher settings of work_mem until you get an in-memory sort. Of course you can only be generous with memory if your machine has enough of it.

What would make your query way faster is if you store pre-aggregated data, either using a materialized view or a second table and a trigger on your original table that keeps the sums in the other table updated. I don't know if that is possible with your data, as I don't know what hll_cardinality and hll_union_agg are.

Dislimn answered 20/1, 2020 at 8:7 Comment(3)

thanks for your reply. hll is Postgres's extension for aggregation cardinality. Refer : github.com/citusdata/postgresql-hll – Kipp 20/1, 2020 at 8:39

This table is already a pre-aggregated table and includes the sums for a day but as the dataset is huge for a pre-aggregated table. The query is not performing well. – Kipp 20/1, 2020 at 8:42

Then there is no improvement possible except more work_mem for the sort. – Dislimn 20/1, 2020 at 8:54

Have you tried a Covering indexes, so the optimizer will use the index, and not do a sequential scan ?

create index covering on rollup_day(tenant_id, created_date, search_keyword, users, sessions, total);

If Postgres 11

create index covering on rollup_day(tenant_id, created_date) INCLUDE (search_keyword, users, sessions, total);

But since you also do a sort/group by on search_keyword maybe :

create index covering on rollup_day(tenant_id, created_date, search_keyword);
create index covering on rollup_day(tenant_id, search_keyword, created_date);

Or :

create index covering on rollup_day(tenant_id, created_date, search_keyword) INCLUDE (users, sessions, total);
create index covering on rollup_day(tenant_id, search_keyword, created_date) INCLUDE (users, sessions, total);

One of these indexes should make the query faster. You should only add one of these indexes.

Even if it makes this query faster, having big indexes will/might make your write operations slower (especially HOT updates are not available on indexed columns). And you will use more storage.

Idea came from here , there is also an hint about size for the work_mem
Another example where the index was not used

Monda answered 1/2, 2020 at 0:36 Comment(5)

Thanks for your answer I'll surely try your mentioned indexes. – Kipp 1/2, 2020 at 17:47

Ok, let me know if anything works best, or if you have the explain. I also thought of putting search_keyword in first position, but I don t think it will work. – Monda 3/2, 2020 at 21:46

Only one sequential index is working in my case 'create index s_k_idx on rollup_day(search_keyword desc);' I 've tried with rest but no luck. – Kipp 4/2, 2020 at 4:45

Do you mean that none of the other indexes are used when you add them and do query plan ? Even create index covering on rollup_day(search_keyword, tenant_id, created_date) INCLUDE (users, sessions, total); OR create index covering on rollup_day(search_keyword, tenant_id, created_date, users, sessions, total); OR create index covering on rollup_day(search_keyword, tenant_id, created_date);? – Monda 4/2, 2020 at 6:4

Yes, I'll share the query plan with all the indexes you mentioned. I think due to group by clause none of the indexes are working. – Kipp 4/2, 2020 at 6:6

use the table partitions and create a composite index it will bring down the total cost as:

it will save huge cost on scans for you.
partitions will segregate data and will be very helpful in future purge operations as well.
I have personally tried and tested table partitions with such cases and the throughput is amazing with the combination of partitions & composite indexes.
Partitioning can be done on the range of created date and then composite indexes on date & tenant.
Remember you can always have a composite index with a condition in it if there is a very specific requirement for the condition in your query. This way the data will be sorted already in the index and will save huge costs for sort operations as well.

Hope this helps.

PS: Also, is it possible to share any test sample data for the same?

Mccallion answered 3/2, 2020 at 6:48 Comment(5)

Thanks, Raj I know about partitioning and that only works if the data set for that particular range is limited or fits in the range, but in my case, the data set is really huge. I optimised the query to 900ms if the created_date fits in the partition range. Once the query goes beyond the range it needs to club the partitions which are a heavy operation. – Kipp 4/2, 2020 at 6:21

i do understand your concern but what the query optimiser is going to do in that case is skip the sort and top-N heap, give it shot, in case it doesn't work you can always go with tuning the work mem. using this I have been able to notch down the queries in the past from even 84000ms to 10ms! true story. – Mccallion 4/2, 2020 at 6:39

Appreciate, Great to hear that. work_mem is only required to move the sort or join computation in memory rather than disk, which I already achieved so I don't think increasing the work_mem will affect the query more. – Kipp 4/2, 2020 at 7:23

let me know if in case you need more detailed info, if you can share some sample maybe I can look at it. good luck! :) – Mccallion 4/2, 2020 at 8:12

sure my linkedin is in my profile, feel free to connect! – Mccallion 4/2, 2020 at 8:33

my suggestion would be to break up the select. Now what I would try also in combination with this to setup 2 indices on the table. One on the Dates the other on the ID. One of the problem with weird IDs is, that it takes time to compare and they can be treated as string compare in the background. Thats why the break up, to prefilter the data before the between command is executed. Now the between command can make a select slow. Here I would suggest to break it up into 2 selects and inner join (I now the memory consumption is a problem).

Here is an example what I mean. I hope the optimizer is smart enough to restructure your query.

SELECT 
    a.search_keyword,
    hll_cardinality( hll_union_agg(a.users) ):: int as user_count,
    hll_cardinality( hll_union_agg(a.sessions) ):: int as session_count,
    sum(a.total) as keyword_count
FROM
    (SELECT
        *
    FROM
        rollup_day a
    WHERE
        a.tenant_id = '62850a62-19ac-477d-9cd7-837f3d716885') t1 
WHERE
    a.created_date between '2018-09-01' and '2019-09-30'
group by
    a.search_keyword
order by
    session_count desc

Now if this does not work then you need more specific optimizations. For example. Can the total be equal to 0, then you need filtered index on the data where the total is > 0. Are there any other criteria that make it easy to exclude rows from the select.

The next consideration would be to create a row where there is a short ID (instead of 62850a62-19ac-477d-9cd7-837f3d716885 -> 62850 ), that can be a number and that would make preselection very easy and memory consumption less.

Adon answered 2/2, 2020 at 13:37 Comment(3)

This query is performing even worse. Your query is taking more than 3 seconds. – Kipp 4/2, 2020 at 6:14

I think you have to reindex, the composite indexes will not work with that query, can you post the execution plan, it will show what needs to be indexed. Just edit my post this will give me a picture what does the optimizer do. – Adon 4/2, 2020 at 8:57

one tiny thin you need to get rid of yout UUID, best make a table that has integers and UUID relation, percona.com/blog/2019/11/22/… – Adon 4/2, 2020 at 9:15

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags