Duplicate postgresql schema including sequences
Asked Answered
C

3

9

My database layout needs to create new schema for each new customer. Currently I use internal function I found on the net and modified a little bit.

CREATE FUNCTION copy_schema(
    source_schema character varying, 
    target_schema character varying, 
    copy_data boolean)
RETURNS integer AS
$BODY$
DECLARE
    t_ex integer := 0;
    s_ex integer := 0;
    src_table character varying;
    trg_table character varying;
BEGIN
    if (select 1 from pg_namespace where nspname = source_schema) THEN
        -- we have defined target schema
        s_ex := 1;
    END IF;

    IF (s_ex = 0) THEN
        -- no source schema exist
        RETURN 0;
    END IF;

    if (select 1 from pg_namespace where nspname = target_schema) THEN
        -- we have defined target schema need to sync all table layout
        t_ex := 1;
    ELSE
        EXECUTE 'CREATE SCHEMA '||target_schema||' AUTHORIZATION user';
    END IF;

    FOR src_table IN 
       SELECT table_name 
       FROM information_schema.TABLES 
       WHERE table_schema = source_schema
    LOOP
        trg_table := target_schema||'.'||src_table;
        EXECUTE 
            'CREATE TABLE ' || trg_table || ' (LIKE ' || source_schema || '.' || src_table || ' INCLUDING ALL)';
        IF (copy_data = true) THEN
            EXECUTE 'INSERT INTO ' || trg_table || '(SELECT * FROM ' || source_schema || '.' || src_table || ')';
        END IF;
    END LOOP;

    return t_ex;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

The problem with this script is that tables in new schema continue to use source schema's sequences. Is there way using sql statements (or other reliable way) to get fresh copy of sequences (or even another reliable way to duplicate entire schema) for the newly created tables?

Coprolalia answered 24/9, 2012 at 20:3 Comment(4)
schema-per-customer may not be an ideal pattern to use. At any rate my gut reaction would be to try to cobble something together based on CREATE DATABASE ... TEMPLATE ..., or pg_dump. In fact, perhaps you should just simplify, and commit a cleaned up .sql template of the schema, and load that up. Trying to templatize a live schema seems like a kludge.Casefy
This is the best thing for database layout i found, as each customer can have sensitive data there. It would be quite big security issue store all data in single database. All data requests are standardized and schema contains only customer relevant data.Coprolalia
If you are using different databases anyway it will be much easier if you create a template database (i.e. an empty database with just the tables) and use create database ... template ... as suggested by FrankGazetteer
There is also common data stored in public schema that is used across application.Coprolalia
C
6

And so after some thinking I went along with updating sql function mentioned in my first post so now it looks like this:

CREATE FUNCTION copy_schema(
    source_schema character varying, 
    target_schema character varying, 
    copy_data boolean)
RETURNS integer AS
$BODY$
DECLARE
    t_ex integer := 0;
    s_ex integer := 0;
    src_table character varying;
    trg_table character varying;
BEGIN
    if (select 1 from pg_namespace where nspname = source_schema) THEN
        -- we have defined target schema
        s_ex := 1;
    END IF;

    IF (s_ex = 0) THEN
        -- no source schema exist
        RETURN 0;
    END IF;

    if (select 1 from pg_namespace where nspname = target_schema) THEN
        -- we have defined target schema need to sync all table layout
        t_ex := 1;
    ELSE
        EXECUTE 'CREATE SCHEMA '||target_schema||' AUTHORIZATION user';
    END IF;

    FOR src_table IN 
        SELECT table_name 
        FROM information_schema.TABLES 
        WHERE table_schema = source_schema
    LOOP
        trg_table := target_schema||'.'||src_table;
        EXECUTE 'CREATE TABLE ' || trg_table || ' (LIKE ' || source_schema || '.' || src_table || ' INCLUDING ALL)';
        EXECUTE 'CREATE SEQUENCE ' || trg_table || '_id_seq OWNED BY '||trg_table || '.id';
        EXECUTE 'ALTER TABLE ' || trg_table || ' ALTER COLUMN id SET DEFAULT nextval('''|| trg_table || '_id_seq''::regclass)';
        IF (copy_data = true) THEN
            EXECUTE 'INSERT INTO ' || trg_table || '(SELECT * FROM ' || source_schema || '.' || src_table || ')';
        END IF;
    END LOOP;
    return t_ex;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

This is not quite universal solution for everybody, but as all my tables in schema have serial field named id, it fits me.

Version suggested by @erwin-brandstetter with dump / hack dump file / restore dump file back again is commonly seen on the forums as the way to go.

In case of dedicated server it could work, in case of shared hosting (or need of less dependencies on outside scripts) the way of internal function seems better.

Coprolalia answered 24/9, 2012 at 22:15 Comment(3)
You may want to add ALTER SEQUENCE seq OWNED BY tbl.col like I mention in my updated answer. That's what a serial column would do.Parathyroid
@erwin-brandstetter: CREATE SEQUENCE seq OWNED BY tbl.col does the same as I understand.Coprolalia
This looks like a good solution, but this creates a sequence that begins at 1, no? What if you want to specify a start with integer based on the nextval() from a sequence in source_schema instead?Catharine
P
7

The root of the problem

The connection to the old sequence comes from a plain default value for the involved column. I quote the manual here:

Default expressions for the copied column definitions will only be copied if INCLUDING DEFAULTS is specified. The default behavior is to exclude default expressions, resulting in the copied columns in the new table having null default.

Since you create new tables with

INCLUDING ALL

And:

INCLUDING ALL is an abbreviated form of INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING COMMENTS.

.. you get the same default. You can either exclude defaults or explicitly change default values including nextval() for the new tables after you created them. I don't think there is any middle ground.


Simpler with dump / hack the dump / restore

or even another reliable way to duplicate entire schema

You could dump the schema of schema (same word, different meaning) with pg_dump:

pg_dump $DB -p $PORT -n $SCHEMA -sf /var/lib/postgresql/your_name.pgsql

Hack the dump (meaning: use a text editor on it, or script it): exchange the schema name at the top of the dump, and all other occurrences in SET search_path and as schema-qualification for sequences and possibly more. If you chose a unique name for the schema, a single run of global search & replace with your favorite tool (sed or vim or ...) should do the job.

Then run the SQL script with psql against the same or any other database:

psql $DB -p $PORT -f /var/lib/postgresql/your_name.pgsql > /dev/null

Contrary to what I first posted, serial columns are still split up in the dump (at least in PostgreSQL 9.1.5). The SQL script creates sequences separately, attaches them to the serial column with:

ALTER SEQUENCE seq OWNED BY tbl.col;

and sets the default value separately.

As an aside: Current versions of pgAdmin reverse engineer serial columns in DDL scripts when all the requirements are met.

Parathyroid answered 24/9, 2012 at 20:27 Comment(7)
Seems like the way to go and create sequences and alter primary key fields, as loosing default values is price not worth paying.Coprolalia
@icebreaker: Or you try the completely different route I outlined in my update.Parathyroid
Looked up my latest database dump (PostgreSQL version 9.1.5) - found this id integer DEFAULT nextval('company_1.branches_id_seq'::regclass) NOT NULL not id SERIAL...Coprolalia
@erwin-brandstetter: Currently it seems I will have one clean optimized dump to multiply schemas from.Coprolalia
@icebreaker: Did you upgrade this database from an older version?Parathyroid
@erwin-brandstetter: No. Currently all my machines share same OS and PostgreSQL version. But I think it could be problem on my production server, as it currently runs 8.4Coprolalia
@icebreaker: I ran tests and found my previous statements about serial columns in the dump were partly incorrect. I fooled myself with scripts inside plpgsql functions in the dump that have creation scripts with serial columns. I amended my answer accordingly.Parathyroid
C
6

And so after some thinking I went along with updating sql function mentioned in my first post so now it looks like this:

CREATE FUNCTION copy_schema(
    source_schema character varying, 
    target_schema character varying, 
    copy_data boolean)
RETURNS integer AS
$BODY$
DECLARE
    t_ex integer := 0;
    s_ex integer := 0;
    src_table character varying;
    trg_table character varying;
BEGIN
    if (select 1 from pg_namespace where nspname = source_schema) THEN
        -- we have defined target schema
        s_ex := 1;
    END IF;

    IF (s_ex = 0) THEN
        -- no source schema exist
        RETURN 0;
    END IF;

    if (select 1 from pg_namespace where nspname = target_schema) THEN
        -- we have defined target schema need to sync all table layout
        t_ex := 1;
    ELSE
        EXECUTE 'CREATE SCHEMA '||target_schema||' AUTHORIZATION user';
    END IF;

    FOR src_table IN 
        SELECT table_name 
        FROM information_schema.TABLES 
        WHERE table_schema = source_schema
    LOOP
        trg_table := target_schema||'.'||src_table;
        EXECUTE 'CREATE TABLE ' || trg_table || ' (LIKE ' || source_schema || '.' || src_table || ' INCLUDING ALL)';
        EXECUTE 'CREATE SEQUENCE ' || trg_table || '_id_seq OWNED BY '||trg_table || '.id';
        EXECUTE 'ALTER TABLE ' || trg_table || ' ALTER COLUMN id SET DEFAULT nextval('''|| trg_table || '_id_seq''::regclass)';
        IF (copy_data = true) THEN
            EXECUTE 'INSERT INTO ' || trg_table || '(SELECT * FROM ' || source_schema || '.' || src_table || ')';
        END IF;
    END LOOP;
    return t_ex;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

This is not quite universal solution for everybody, but as all my tables in schema have serial field named id, it fits me.

Version suggested by @erwin-brandstetter with dump / hack dump file / restore dump file back again is commonly seen on the forums as the way to go.

In case of dedicated server it could work, in case of shared hosting (or need of less dependencies on outside scripts) the way of internal function seems better.

Coprolalia answered 24/9, 2012 at 22:15 Comment(3)
You may want to add ALTER SEQUENCE seq OWNED BY tbl.col like I mention in my updated answer. That's what a serial column would do.Parathyroid
@erwin-brandstetter: CREATE SEQUENCE seq OWNED BY tbl.col does the same as I understand.Coprolalia
This looks like a good solution, but this creates a sequence that begins at 1, no? What if you want to specify a start with integer based on the nextval() from a sequence in source_schema instead?Catharine
D
2

You could simply backup up the schema, then rename it in the DB, then restore the file that was backed up.

Dilemma answered 7/12, 2020 at 17:49 Comment(3)
How to automate?Coprolalia
you could automate this by a combination of SQL scripts and bashDilemma
Didn't see your point first time I read it. First backup, rename with SQL and then restore source schema... Sounds usable in some cases.Coprolalia

© 2022 - 2024 — McMap. All rights reserved.