Postgres - Run same query on all databases (same schemas)
Asked Answered
O

2

8

hoping this is a pretty straightforward question.

I have a straightforward SELECT query (with a few sub-queries built in). I have over 40 DBs and I need to run this query for all DBs (all have same schema) and return the result in a big table.

I'm imagining a loop sequence (like with javascript's i=0; i < 40; i++) with a defined variable that will automatically stop once it's run all the DBs.

(I am working in Navicat, tho that probably doesn't matter)

Thank you!

Overbid answered 24/2, 2017 at 19:49 Comment(5)
I have to ask... why do you have 40 databases all with the same schema? And, just to make sure, that's server side Javascript doing the query, right?Piebald
I'm just using the Javascript thing as an example, this post has nothing to with Javascript, sorry for the confusion. 40 DBs are for different clients that all have tons of data from multiple sources (various media sites + marketing data etc.)Overbid
When you say "different databases", they're still on the same server?Piebald
Possible duplicate of Possible to perform cross-database queries with postgres?Piebald
Yeah, Redshift. I'm not sure why multiple DBs were set up, I didn't design it. I'll check out that other thread. ThanksOverbid
H
2

In case someone needs a more involved example on how to do cross-database queries, here's an example that cleans up the databasechangeloglock table on every database that has it:

CREATE EXTENSION IF NOT EXISTS dblink;

DO 
$$
DECLARE database_name TEXT;
DECLARE conn_template TEXT;
DECLARE conn_string TEXT;
DECLARE table_exists Boolean;
BEGIN
    conn_template = 'user=myuser password=mypass dbname=';

    FOR database_name IN
        SELECT datname FROM pg_database
        WHERE datistemplate = false
    LOOP
        conn_string = conn_template || database_name;

        table_exists = (select table_exists_ from dblink(conn_string, '(select Count(*) > 0 from information_schema.tables where table_name = ''databasechangeloglock'')') as (table_exists_ Boolean));
        IF table_exists THEN
            perform dblink_exec(conn_string, 'delete from databasechangeloglock');
        END IF;     
    END LOOP;

END
$$
Heather answered 12/4, 2019 at 13:55 Comment(0)
C
2

It is possible to accomplish this in a single query using an Postgres' extension called dblink. This extension becomes available after you install postgresql-contrib package.

To be able to access it, you must add it in one of your databases.

CREATE EXTENSION IF NOT EXISTS dblink;
-- or
CREATE EXTENSION IF NOT EXISTS dblink WITH SCHEMA schema_name_here;

Using dblink function dblink(conn_str, sql_query) you can execute dynamically generated SQL statements. The user you will use in the connection string matters, so, choose a user that can access all databases, schemas and tables involved.

As an example, the following SQL queries all databases for the table names from schemata table, in information_schema schema, filtering by columns named data_type.

select datname,
       schema_name,
       unnest(table_names) as table_name
from (select datname,
             schema_name,
             (select table_names
              from dblink(
                  'dbname=' || datname || ' user=postgres password=postgres',
                  'select array_agg(table_name) as table_names from ' || schema_name || '.columns where column_name = ''data_type''')
                  as p(table_names character varying array)) as table_names
      from (select datname,
                   unnest(schema_name_arr) as schema_name
            from (select datname,
                         (select schema_name_arr
                          from dblink(
                              'dbname=' || datname || ' user=postgres password=postgres',
                              'select array_agg(distinct nspname) as schema_name_arr from pg_catalog.pg_namespace where nspname like ''information_schema''')
                              as t(schema_name_arr character varying array)) as schema_name_arr
                  from pg_catalog.pg_database
                  where datistemplate is false) q
            where schema_name_arr is not null
            ) r
    ) s;

The main query here is this: select array_agg(table_name) as table_names from ' || schema_name || '.columns where column_name = ''data_type''.

Since dblink is being used in the SELECT clause, it is restricted to return only one column. That's why I'm using the combo ARRAY_AGG + UNNEST.

If you added dblink module into a schema, remember to use schema_name.dblink when calling that function.

Hope it helps. Happy coding! :)

Cath answered 9/6, 2022 at 15:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.