pg_dump --exclude-table pattern matching
Asked Answered
P

2

16

The Problem

I use pg_dump to create a schema-only archive file of our database. I'd like to reduce the size of this file (and decrease the restore time) by omitting all but one of a set of partitioned tables in the dump.

For example, there are the following partitioned tables (by date) in the database. I only want to keep the last one.

awp_partition.awp_text_search_history_201209
awp_partition.awp_text_search_history_201210
awp_partition.awp_text_search_history_201211
awp_partition.awp_text_search_history_201212
plus hundreds more...

I created a pg_dump command (called inside a bash script) designed to exclude all of these tables except for the latest one using a negative lookahead regular expression:

pg_dump -h 11.111.11.11 -p 5432 -U username -F c -s \
-T 'awp_partition.awp_text_search_history_(?!201212)\d{6}' \
dbname > /home/me/tmp/prod3.backup

However, when I run this command, all of the tables are excluded from the dump file.

What I've tried

I tried using a combination of table include and exclude parameters, but attempting to exclude all tables while including one - that matched the pattern for the excludes - caused the entire dump to fail.

I tested my regex using Postgres regexp_matches() function and it correctly matched the tables I expected. However, based on the documentation: pg_dump documentation

Dump only tables (or views or sequences or foreign tables) matching table. Multiple tables can be selected by writing multiple -t switches. Also, the table parameter is interpreted as a pattern according to the same rules used by psql's \d commands (see Patterns), so multiple tables can also be selected by writing wildcard characters in the pattern. When using wildcards, be careful to quote the pattern if needed to prevent the shell from expanding the wildcards;

And the related documentation psql patterns documentation

Advanced users can use regular-expression notations such as character classes, for example [0-9] to match any digit. All regular expression special characters work as specified in Section 9.7.3, except for . which is taken as a separator as mentioned above, * which is translated to the regular-expression notation ., ? which is translated to ., and $ which is matched literally. You can emulate these pattern characters at need by writing ? for ., (R+|) for R, or (R|) for R?. $ is not needed as a regular-expression character since the pattern must match the whole name, unlike the usual interpretation of regular expressions (in other words, $ is automatically appended to your pattern). Write * at the beginning and/or end if you don't wish the pattern to be anchored. Note that within double quotes, all regular expression special characters lose their special meanings and are matched literally. Also, the regular expression special characters are matched literally in operator name patterns (i.e., the argument of \do).

I realize that the syntax of the negative lookahead operator may not be supported in this operation.

What to do?

It looks like I need to change my pattern matching strategy, and I'm struggling to think of a way to exclude all but one of those tables using psql \d patterns. Any ideas?

Practise answered 6/6, 2014 at 18:24 Comment(3)
Testing your regex, I get the result that the first 3 lines match. I think this is your desired behaviour... so my guess is that the problem lies with pg_dumpGravure
Also, I find that psql patterns documentation extremely confusing! This could definitely be the source of the issue since they are messing around with what the different regex characters meanGravure
Agreed. The subset of functionality available in \d pattern matching is frustrating. I've also found it is difficult to test. Some \d patterns that work at the psql command line don't work with pg_dump. I'm still trying to figure out a reliable way to test this without having to go through the whole dump restore cycle.Practise
C
2

For pg_dump you need to pass schema and table in separated options, -n for schema and -t for a table. You need to run something like

pg_dump -h 11.111.11.11 -p 5432 -U username -F c -s \
-n awp_partition -T 'awp_text_search_history_(?!201212)\d{6}' \
dbname > /home/me/tmp/prod3.backup

It's a (relatively) old thread, but maybe someone in the future will have a similar problem.

Crinkumcrankum answered 26/12, 2021 at 16:41 Comment(0)
G
0

Have you tried turning the regex inside out like this?

awp_partition.awp_text_search_history_(201209|201210|201211)
Gravure answered 6/6, 2014 at 19:51 Comment(3)
I'm currently working on something similar: awp_partition.awp_text_search_history_201[23](0[1-9]|1[012]) The reason being, there are hundreds of these tables. I only included a few in the example for brevity.Practise
It would help others help you solve this problem if you make available the entire list (preferably with a link). :)Gravure
That's true. Sometimes brevity is more of a hindrance than help. I'll update the question to make it more clear. I used the solution I alluded to above to achieve a "good enough" result for my needs, but I suspect a postgres/regex master could do better than me. pg_dump -h ${DBHOST} -p 5432 -U ${USERNAME} -F c -s \ -T 'awp_partition.awp_text_search_history_201[23](0[1-9]|1[012])' \ -T 'awp_partition.awp_text_search_history_20140[1-5]' \ ...Practise

© 2022 - 2024 — McMap. All rights reserved.