Removing comments from pg_dump output

Asked 30/1, 2017 at 17:7 Answered 23/6, 2017 at 12:13

When a PostgreSQL pg_dump is done it inserts some comments for each element, as follows.

--
-- Name: my_table; Type: TABLE; Schema: account; Owner: user; Tablespace:
--

CREATE TABLE my_table(
    id integer
);

--
-- Name: my_seq; Type: SEQUENCE; Schema: account; Owner: user
--

CREATE SEQUENCE my_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;

Is it possible to force pg_dump to remove (exclude) them? I would like to receive just:

CREATE TABLE my_table(
    id integer
);

CREATE SEQUENCE my_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;

Predesignate answered 30/1, 2017 at 17:7 Comment(7)

Pipe it through sed like this: sed -e '/^--/d' – Lennie 30/1, 2017 at 17:18

@LaurenzAlbe, Could you please post an answer best describing your suggestion? I didn't understand how to use it. – Predesignate 30/1, 2017 at 21:3

What operating system are you on? – Lennie 31/1, 2017 at 9:30

@LaurenzAlbe, I am using CentOS 7.2 with PostgreSQL 9.2.5 – Predesignate 31/1, 2017 at 18:18

Why do comments represent a problem? – Bertine 23/6, 2017 at 12:1

@vitaly-t, In my use case they became obvious and the file size increases a lot – Predesignate 23/6, 2017 at 13:19

@MarcioMazzucato Thank you, I very much suspected that. I added my own answer below, for what it's worth, which by the way reduces the file size much better than in the earlier answers, because it can fully compress the SQL, removing all redundant spaces. – Bertine 23/6, 2017 at 13:35

On a UNIX type operating system, I would do it like this:

pg_dump [options] mydatabase | sed -e '/^--/d' >mydatabase.dmp

This could accidentally swallow data lines that start with --. To work around that problem, use the --inserts option of pg_dump.

Lennie answered 1/2, 2017 at 5:59 Comment(8)

It works fine, many thanks! It replaced the comments to a blank line, is it possible to remove it? Not very important, but the file would be more clean. – Predesignate 1/2, 2017 at 13:41

This will remove the line and not leave an empty line behind. The empty lines you see are empty in the original dump file. Use the sed command /^$/d to remove those. – Lennie 1/2, 2017 at 13:52

Note that with the default pg_dump output format, if any of your tables contains a row whose contents start with --, you will delete that data! (For example: CREATE TABLE strs(s text); INSERT INTO strs(s) VALUES('-- hi');.) – Ecstatics 1/1, 2022 at 8:35

@Ecstatics That is not true. Try it. – Lennie 7/1, 2022 at 7:38

@LaurenzAlbe: I did try it, with PostgreSQL version 12.8 installed from the Ubuntu repositories, and it does, in fact, delete that data from the dump. Demo: gist.github.com/wchargin/6fd1ded5c683bc2420eb59ad53aeb5b7 – Ecstatics 8/1, 2022 at 7:8

@Ecstatics In your code, you explicitly delete the line with sed, so it is no surprise if it doesn't show up at the destination. psql will have no problem restoring a value that srarts with --. – Lennie 10/1, 2022 at 7:39

@LaurenzAlbe: Yes, of course psql can restore a value that starts with --. The code that explicitly deletes the line with sed is from your answer to this question. My comment is pointing out that the sed -e '/^--/d' provided by your answer, which is intended to only delete SQL comments generated by pg_dump, can also delete actual data from the dump. – Ecstatics 11/1, 2022 at 4:35

@Ecstatics Ah, now I see. You are right, that is a flaw in my answer. You could work around it with the --inserts option of pg_dump. – Lennie 11/1, 2022 at 6:15

I've just submitted this patch for Postgres 11+ (still under consideration) that should allow one to dump without COMMENTS (until an ideal solution is in place), which should be a slightly better kludge than the ones we resort to using.

If there are enough voices, it may even get back-patched to Postgres 10!

[UPDATE]

This is now a feature in Postgres v11+

pg_dump --no-comments

Merc answered 30/5, 2017 at 19:36 Comment(6)

A long requested feature! :-D – Remissible 5/11, 2018 at 23:35

did this make it to 11? I don't see it documented – Lovesome 24/9, 2019 at 14:30

In fact it did, back in Jan 2018 - thatguyfromdelhi.com/2017/05/… – Merc 26/9, 2019 at 10:33

Usage: pg_dump --no-comments – Freckly 28/8, 2020 at 14:9

With pg_dump (PostgreSQL) 11.11 (Debian 11.11-1.pgdg90+1) the --no-comments flag is present but doesn't actually seem to do anything. At least with the combination of flags I'm using – Warmblooded 2/3, 2021 at 20:43

The author wanted to remove lines beginning with -- from pg_dump output, NOT disable the output of SQL COMMENT statements. – Capitalistic 6/4, 2021 at 8:57

The are only 2 good reasons for removing comments from SQL:

The SQL file contains formatting variables (placeholders) that need to be replaced dynamically. In this case removing comments prevents false variable detection when those are referenced in comments.
The SQL file is to be minimized, to reduce the size of what needs to go through IO and into the database server.

In either cases, it implies the SQL file is now meant only for execution, and not for reading.

And specifically for PostgreSQL, there is package pg-minify which does exactly that:

It removes all the comments and minimizes the resulting SQL
It can optionally compress the SQL to its bare minimum (option compress)

complete example

const minify = require('pg-minify');
const fs = require('fs');

fs.readFile('./sqlTest.sql', 'utf8', (err, data) => {
    if (err) {
        console.log(err);
    } else {
        console.log(minify(data));
    }
});

Bertine answered 23/6, 2017 at 12:13 Comment(3)

Good suggestion! But as it is NodeJS dependent, i can't use it today. @Laurenz Albe's solution fits well in my case because is more flexible, i can use native PostgreSQL and Linux commands. – Predesignate 23/6, 2017 at 13:19

There's a third good reason for removing comments from the output of pg_dump specifically: they make it harder to read, by taking up a bunch of vertical space (six lines each, only one of which has text) and convey no information. The line comment "Name: mytable; Type: TABLE; Schema: public" is useless when it's immediately followed by CREATE TABLE public.mytable. I would like to remove these comments specifically to make it more human-readable, not to minify it. – Ecstatics 1/1, 2022 at 8:31

Removing comments makes the schema dump consistent between different platforms, which is essential if it is checked into the version control. – Anfractuous 7/8, 2023 at 8:4

Marcio, piping is the process of taking the output of one process and feeding it directly into another to achieve a specific purpose. Let's say you were using Julia to achieve the result you need. Create a test database and play with it to ensure that you get the desired result. This Julia command would produce a backup with comments:

run(pipeline(`pg_dump -d test`,"testdump.sql"))

Here Julia is asked to dump the backup into testdump.sql so that we can check the result. Note the backticks. Then comes another command which uses the filter suggested by @LaurenzAlbe:

run(pipeline(`cat testdump.sql`,`sed -e '/^--/d'`,"testdump2.sql"))

Here we have a three part pipeline which scans the backup with the comments, strips the comments out and dumps what remains into testdump2. You can now check that the first and second files are what is required.

Once you have confidence that the solution provided by @LaurenzAlbe is correct, you can make the required substitutions to run the entire thing in one pipeline command. Of course you can do the same thing directly in a bash terminal or Python or the scripting engine of your choice.

Millikan answered 31/1, 2017 at 6:51 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags