Removing comments from pg_dump output
Asked Answered
P

4

14

When a PostgreSQL pg_dump is done it inserts some comments for each element, as follows.

--
-- Name: my_table; Type: TABLE; Schema: account; Owner: user; Tablespace:
--

CREATE TABLE my_table(
    id integer
);

--
-- Name: my_seq; Type: SEQUENCE; Schema: account; Owner: user
--

CREATE SEQUENCE my_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;

Is it possible to force pg_dump to remove (exclude) them? I would like to receive just:

CREATE TABLE my_table(
    id integer
);

CREATE SEQUENCE my_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;
Predesignate answered 30/1, 2017 at 17:7 Comment(7)
Pipe it through sed like this: sed -e '/^--/d'Lennie
@LaurenzAlbe, Could you please post an answer best describing your suggestion? I didn't understand how to use it.Predesignate
What operating system are you on?Lennie
@LaurenzAlbe, I am using CentOS 7.2 with PostgreSQL 9.2.5Predesignate
Why do comments represent a problem?Bertine
@vitaly-t, In my use case they became obvious and the file size increases a lotPredesignate
@MarcioMazzucato Thank you, I very much suspected that. I added my own answer below, for what it's worth, which by the way reduces the file size much better than in the earlier answers, because it can fully compress the SQL, removing all redundant spaces.Bertine
L
19

On a UNIX type operating system, I would do it like this:

pg_dump [options] mydatabase | sed -e '/^--/d' >mydatabase.dmp

This could accidentally swallow data lines that start with --. To work around that problem, use the --inserts option of pg_dump.

Lennie answered 1/2, 2017 at 5:59 Comment(8)
It works fine, many thanks! It replaced the comments to a blank line, is it possible to remove it? Not very important, but the file would be more clean.Predesignate
This will remove the line and not leave an empty line behind. The empty lines you see are empty in the original dump file. Use the sed command /^$/d to remove those.Lennie
Note that with the default pg_dump output format, if any of your tables contains a row whose contents start with --, you will delete that data! (For example: CREATE TABLE strs(s text); INSERT INTO strs(s) VALUES('-- hi');.)Ecstatics
@Ecstatics That is not true. Try it.Lennie
@LaurenzAlbe: I did try it, with PostgreSQL version 12.8 installed from the Ubuntu repositories, and it does, in fact, delete that data from the dump. Demo: gist.github.com/wchargin/6fd1ded5c683bc2420eb59ad53aeb5b7Ecstatics
@Ecstatics In your code, you explicitly delete the line with sed, so it is no surprise if it doesn't show up at the destination. psql will have no problem restoring a value that srarts with --.Lennie
@LaurenzAlbe: Yes, of course psql can restore a value that starts with --. The code that explicitly deletes the line with sed is from your answer to this question. My comment is pointing out that the sed -e '/^--/d' provided by your answer, which is intended to only delete SQL comments generated by pg_dump, can also delete actual data from the dump.Ecstatics
@Ecstatics Ah, now I see. You are right, that is a flaw in my answer. You could work around it with the --inserts option of pg_dump.Lennie
M
8

I've just submitted this patch for Postgres 11+ (still under consideration) that should allow one to dump without COMMENTS (until an ideal solution is in place), which should be a slightly better kludge than the ones we resort to using.

If there are enough voices, it may even get back-patched to Postgres 10!


[UPDATE]

This is now a feature in Postgres v11+

pg_dump --no-comments
Merc answered 30/5, 2017 at 19:36 Comment(6)
A long requested feature! :-DRemissible
did this make it to 11? I don't see it documentedLovesome
In fact it did, back in Jan 2018 - thatguyfromdelhi.com/2017/05/…Merc
Usage: pg_dump --no-commentsFreckly
With pg_dump (PostgreSQL) 11.11 (Debian 11.11-1.pgdg90+1) the --no-comments flag is present but doesn't actually seem to do anything. At least with the combination of flags I'm usingWarmblooded
The author wanted to remove lines beginning with -- from pg_dump output, NOT disable the output of SQL COMMENT statements.Capitalistic
B
2

The are only 2 good reasons for removing comments from SQL:

  1. The SQL file contains formatting variables (placeholders) that need to be replaced dynamically. In this case removing comments prevents false variable detection when those are referenced in comments.

  2. The SQL file is to be minimized, to reduce the size of what needs to go through IO and into the database server.

In either cases, it implies the SQL file is now meant only for execution, and not for reading.

And specifically for PostgreSQL, there is package pg-minify which does exactly that:

  • It removes all the comments and minimizes the resulting SQL
  • It can optionally compress the SQL to its bare minimum (option compress)

complete example

const minify = require('pg-minify');
const fs = require('fs');

fs.readFile('./sqlTest.sql', 'utf8', (err, data) => {
    if (err) {
        console.log(err);
    } else {
        console.log(minify(data));
    }
});
Bertine answered 23/6, 2017 at 12:13 Comment(3)
Good suggestion! But as it is NodeJS dependent, i can't use it today. @Laurenz Albe's solution fits well in my case because is more flexible, i can use native PostgreSQL and Linux commands.Predesignate
There's a third good reason for removing comments from the output of pg_dump specifically: they make it harder to read, by taking up a bunch of vertical space (six lines each, only one of which has text) and convey no information. The line comment "Name: mytable; Type: TABLE; Schema: public" is useless when it's immediately followed by CREATE TABLE public.mytable. I would like to remove these comments specifically to make it more human-readable, not to minify it.Ecstatics
Removing comments makes the schema dump consistent between different platforms, which is essential if it is checked into the version control.Anfractuous
M
1

Marcio, piping is the process of taking the output of one process and feeding it directly into another to achieve a specific purpose. Let's say you were using Julia to achieve the result you need. Create a test database and play with it to ensure that you get the desired result. This Julia command would produce a backup with comments:

run(pipeline(`pg_dump -d test`,"testdump.sql"))

Here Julia is asked to dump the backup into testdump.sql so that we can check the result. Note the backticks. Then comes another command which uses the filter suggested by @LaurenzAlbe:

run(pipeline(`cat testdump.sql`,`sed -e '/^--/d'`,"testdump2.sql"))

Here we have a three part pipeline which scans the backup with the comments, strips the comments out and dumps what remains into testdump2. You can now check that the first and second files are what is required.

Once you have confidence that the solution provided by @LaurenzAlbe is correct, you can make the required substitutions to run the entire thing in one pipeline command. Of course you can do the same thing directly in a bash terminal or Python or the scripting engine of your choice.

Millikan answered 31/1, 2017 at 6:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.