CloudSQL with PostgreSQL very slow performance

Asked 10/11, 2021 at 22:49 Answered 17/4 at 13:19

postgresql query-optimization google-cloud-sql

I wanted to migrate from BigQuery to CloudSQL to save cost. My problem is that CloudSQL with PostgreSQL is very very slow compare to BigQuery. A query that takes 1.5 seconds in BigQuery takes almost 4.5 minutes(!) on CloudSQL with PostgreSQL.

I have CloudSQL with PostgreSQL server with the following configs:

My database have a main table with 16M rows (around 14GB in RAM).

A example query:

EXPLAIN ANALYZE 
  SELECT 
    "title"
  FROM 
    public.videos
  WHERE 
      EXISTS (SELECT 
                * 
              FROM (
                    SELECT 
                      COUNT(DISTINCT CASE WHEN LOWER(param) LIKE '%thriller%' THEN '0'
                                          WHEN LOWER(param) LIKE '%crime%' THEN '1' END) AS count
                    FROM
                      UNNEST(categories) AS param
                    ) alias
                        WHERE count = 2)

  ORDER BY views DESC 

  LIMIT 12 OFFSET 0

The table is a videos tables with categories column as text[]. The search condition here looks where there is a categories which is like '%thriller%' and like '%crime%' exactly two times

The EXPLAIN ANALYZE of this query gives this output (CSV): link. The EXPLAIN (BUFFERS) of this query gives this output (CSV): link.

Query Insights graph:

Memory profile:

BigQuery reference for the same query on the same table size:

Server config: link.

Table describe: link.

My goal is to have Cloud SQL with the same query speed as Big Query

Nurserymaid answered 10/11, 2021 at 22:49 Comment(10)

Please show us the table definition, including the indexes. Is there any chance you could normalize your categories? – Outstrip 11/11, 2021 at 0:9

Sorry I cant show it due to IP issues. No indexes. – Nurserymaid 11/11, 2021 at 0:12

Does this answer your question? https://mcmap.net/q/575076/-bigquery-vs-cloud-sql-for-dashboards-backend – Tajo 11/11, 2021 at 8:42

You should turn on track_io_timing, then run EXPLAIN (ANALYZE, BUFFERS) and post the results as properly formatted to text (not as images), preserving indentation. Does BigQuery also have something equivalent to the execution plan you could show? – Keesee 11/11, 2021 at 11:41

@Tajo thanks for the link. My database is not that big, only 16M rows. I agree that BQ is suited for really big tables but I dont think its my case – Nurserymaid 11/11, 2021 at 17:19

@Keesee I will post more details. Yes BQ have execution plan, I can show it. But how it will help you? – Nurserymaid 11/11, 2021 at 17:20

My mine problem as I see it is that I cant up tune the server because Google does not allow it – Nurserymaid 11/11, 2021 at 17:20

How to tune the db: cloud.google.com/sql/docs/postgres/flags#console – Nurserymaid 12/11, 2021 at 0:15

All new data updated. table definition, EXPLAIN (ANALYZE, BUFFERS) – Nurserymaid 12/11, 2021 at 0:34

@Nurserymaid On your "No indexes." statement: Do you mean there are none, but you do consider adding some if they would get you closer to BigQuery performance, or that they are not allowed in your setup? Also, are you free to change the table structure? – Gapin 25/11, 2021 at 9:33

For anyone coming here wondering how to tune their postgres machine on cloud sql they call it flags and you can do it from the UI although not all the config options are edit able.

https://cloud.google.com/sql/docs/postgres/flags#console

Nurserymaid answered 12/11, 2021 at 0:35 Comment(0)

The initial query looks overcomplicated. It could be rewritten as:

SELECT  v."title"
FROM public.videos v
WHERE array_to_string(v.categories, '^') ILIKE ALL (ARRAY['%thriller%', '%crime%'])
ORDER BY views DESC 
LIMIT 12 OFFSET 0;

db<>fiddle demo

Muscadel answered 14/11, 2021 at 11:15 Comment(1)

Thanks for the answer. I have tried your solution on my db and its still very very slow. 2.5 minutes to finish. This the EXPLAIN ANALYZE: ufile.io/ahk9qoem I want it to run under 2 seconds like in big query – Nurserymaid 14/11, 2021 at 15:25

PostGreSQL is very slow by design on every queries involving COUNT aggregate function and there is absolutly nothing to do except materialized view to enforces the performances.

The tests I have made on my machine with 48 cores about COUNT performances compare from PostGreSQL to MS SQL Server is clear : SQL Server is between 61 and 561 times faster in all situations, and with columnstore index SQL Server can be 1,533 time faster…

The same speed is reached when using any other RDBMS. The explanation is clearly the PG MVCC that maintain ghost rows inside table and index pages, that needs to browse every rows to know if it is an active or ghost row... In all the other RDBMS, the count is done by reading only one information at the top of the page (number of rows in the page) and also by using parallelized access or in SQL Server a batch access and not a row access...

There is nothing to do to speed up the count in PG until the storage engine will not been enterely rewrite to avoid ghost slots inside pages...

Volney answered 16/11, 2021 at 15:54 Comment(2)

Thanks for the answer, I tried mysql and its still very very slow. I was using JSON object because there is not array type in mysql – Nurserymaid 21/11, 2021 at 13:23

Arrays and stuff like arrays (JSON, XML, records, multisets...) are opposite to the basics principles of relational model, that induces that data must be atomic. And when it is atomic, it is possible to speed up with indexes. Ecepts in specifice database like Oracle or Microsoft SQL Server that can indexes XML, no other RDBMS knows how to speed up access with indexes on composite structures like ARRAY ! By the way a table is very very close to array and can be indexed... – Volney 26/11, 2021 at 8:4

I believe you need to use a full-text search and the special GIN index. The steps:

Create the helper function for index: CREATE OR REPLACE FUNCTION immutable_array_to_string(text[]) RETURNS text as $$ SELECT array_to_string($1, ','); $$ LANGUAGE sql IMMUTABLE;
Create index itself: CREATE INDEX videos_cats_fts_idx ON videos USING gin(to_tsvector('english', LOWER(immutable_array_to_string(categories))));

Use the following query: SELECT title FROM videos WHERE (to_tsvector('english', immutable_array_to_string(categories)) @@ (to_tsquery('english', 'thriller & crime'))) limit 12 offset 0;

Be aware that this query has a different meaning for 'crime' and 'thriller'. They are not just substrings. They are tokens in English phrases. But it looks that actually it is better for your task. Also, this index is not good for frequently changed data. It should work fine when you have mostly read-only data.

PS This answer is inspired by answer & comments: https://mcmap.net/q/745822/-postgres-full-text-search-on-array-column

Gargantua answered 27/11, 2021 at 13:17 Comment(0)

Apart from the sql syntax optimization, have you tried Postgresql tune?

I check the explaination has found only two workers in parallel and 25KMemory used in sorting.

Workers Planned: 2" Sort Method: quicksort Memory: 25kB"

For your query, it is typical OLAP query. it performance usually related the memory(memory and cpu cores used(workers). The default postgres use KB level memory and few workers. You can tune your postgresql.conf to optimized it work as OLAP type database.

=================================================== Here is my suggestion: use more memory(9MB as work mem ) and more cpu(max 16)

# DB Version: 13
# OS Type: linux
# DB Type: dw
# Total Memory (RAM): 24 GB
# CPUs num: 16
# Data Storage: ssd

max_connections = 40
shared_buffers = 6GB
effective_cache_size = 18GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 9830kB
min_wal_size = 4GB
max_wal_size = 16GB
max_worker_processes = 16
max_parallel_workers_per_gather = 8
max_parallel_workers = 16
max_parallel_maintenance_workers = 4

You can add it to you postgresql.conf last line. And restart your postgresql server to make it effect.

To further optimization,

reduce the connection and increase the work_mem. 200* 9830 is about 2GB memory for all connections. If you has less( for example, 100) connections, you can get more memory for query working.

==================================== Regarding using text array type and unnest. you can try to add proper index.

That's all, good luck.

WangYong

Linn answered 28/11, 2021 at 4:6 Comment(1)

The question is about PostgreSQL on Google Cloud SQL - there is no direct access to postgresql.conf. effective_io_concurrency, max_parallel_workers_per_gather, max_parallel_workers, max_parallel_maintenance_workers settings are not made available through instance customisation flags. – Gapin 29/11, 2021 at 8:18

You can modify the performance parameter settings under Flags section Ex.

Cite answered 17/4 at 13:19 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

==================================== Regarding using text array type and unnest. you can try to add proper index.

Recommended topics

Hot tags