How to copy from CSV file to PostgreSQL table with headers in CSV file?
Asked Answered
L

7

118

I want to copy a CSV file to a Postgres table. There are about 100 columns in this table, so I do not want to rewrite them if I don't have to.

I am using the \copy table from 'table.csv' delimiter ',' csv; command but without a table created I get ERROR: relation "table" does not exist. If I add a blank table I get no error, but nothing happens. I tried this command two or three times and there was no output or messages, but the table was not updated when I checked it through PGAdmin.

Is there a way to import a table with headers included like I am trying to do?

Libertylibia answered 15/7, 2013 at 19:50 Comment(5)
Your table is named table ? Very confusing. Does the table exist, or do you want to create it based on the CSV? (you can't)Lukash
well, I named it something else, but for this example lets call it table. I tried with and without it existing I also tried to do \copy table(column1, column2, ...) from 'table.csv' delimiter ',' csv; with no luck either. Ideally the table could be created through the CSV alone, and use the headers in that file.Libertylibia
Related: #2987933Dispatcher
Just a heads up for anyone planning on turning a large csv into a postgres table -- postgres is capped at 1600 columns in a single table. You cannot chunk tables into 1600-column-sized ones and then join them after. You need to redesign the db.Unclothe
If python is available to you, you can use d6tstack. It takes care of schema changes too.Prostitution
D
163

This worked. The first row had column names in it.

COPY wheat FROM 'wheat_crop_data.csv' DELIMITER ';' CSV HEADER
Dispatcher answered 16/7, 2013 at 1:46 Comment(7)
I think the problem with this command is, that you have to be the DB superuser. \copy works as normal user, tooOfficious
COPY does not create a table or add columns to it, it adds rows to an existing table with its existing columns. Presumably the asker wants to automate the creation of the ~100 columns, and COPY does not have this functionality, as of PG 9.3 at least.Rugen
@Officious good catch. Since I'm never an admin or superuser for DBs on the postgres systems I use (the pgadmin makes me owner of the databases I use and gives me limited privileges/roles) I must have used `\COPY'. CheersDispatcher
@Daniel I understood the user's table already existed and had all the columns they needed and that they wanted to simply ADD data.Dispatcher
Got syntax error at or near "HEADER" LINE 2: delimiter ',' CSV HEADER on aws redshift.Voiture
This comes directly from postgresql.org/docs/9.2/sql-copy.html, the documentation. Just add the headers.Wistrup
You can also specify column names via COPY wheat (column1,column2) FROM 'wheat_crop_data.csv' DELIMITER ';' CSV HEADERCoauthor
H
28

With the Python library pandas, you can easily create column names and infer data types from a csv file.

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('postgresql://user:pass@localhost/db_name')
df = pd.read_csv('/path/to/csv_file')
df.to_sql('pandas_db', engine)

The if_exists parameter can be set to replace or append to an existing table, e.g. df.to_sql('pandas_db', engine, if_exists='replace'). This works for additional input file types as well, docs here and here.

Halford answered 30/4, 2015 at 0:45 Comment(4)
I find that pd.DataFrame.from_csv gives me less trouble, but this answer is by far the easiest way to do this, IMO.Deflation
True, I am not sure why I typed pd.read_excel, instead of pd.read_csv. I updated the answer.Halford
this is a fantastic solution for when you do not want to pre-create the table that will hold a large csv. Just a heads up though -- postgres can only take 1600 columns in a table. Apparently other DB engines will allow more. Having this many columns is apparently poor SQL form, although this consensus has yet to filter through to epidemiology.Unclothe
By default df.to_sql() is VERY SLOW, to speed this up you can use d6tstack. It takes care of schema changes too.Prostitution
M
14

Alternative by terminal with no permission

The pg documentation at NOTES say

The path will be interpreted relative to the working directory of the server process (normally the cluster's data directory), not the client's working directory.

So, gerally, using psql or any client, even in a local server, you have problems ... And, if you're expressing COPY command for other users, eg. at a Github README, the reader will have problems ...

The only way to express relative path with client permissions is using STDIN,

When STDIN or STDOUT is specified, data is transmitted via the connection between the client and the server.

as remembered here:

psql -h remotehost -d remote_mydb -U myuser -c \
   "copy mytable (column1, column2) from STDIN with delimiter as ','" \
   < ./relative_path/file.csv
Manciple answered 4/1, 2017 at 13:22 Comment(0)
P
5

I have been using this function for a while with no problems. You just need to provide the number columns there are in the csv file, and it will take the header names from the first row and create the table for you:

create or replace function data.load_csv_file
    (
        target_table  text, -- name of the table that will be created
        csv_file_path text,
        col_count     integer
    )

    returns void

as $$

declare
    iter      integer; -- dummy integer to iterate columns with
    col       text; -- to keep column names in each iteration
    col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet

begin
    set schema 'data';

    create table temp_table ();

    -- add just enough number of columns
    for iter in 1..col_count
    loop
        execute format ('alter table temp_table add column col_%s text;', iter);
    end loop;

    -- copy the data from csv file
    execute format ('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_file_path);

    iter := 1;
    col_first := (select col_1
                  from temp_table
                  limit 1);

    -- update the column names based on the first row which has the column names
    for col in execute format ('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
    loop
        execute format ('alter table temp_table rename column col_%s to %s', iter, col);
        iter := iter + 1;
    end loop;

    -- delete the columns row // using quote_ident or %I does not work here!?
    execute format ('delete from temp_table where %s = %L', col_first, col_first);

    -- change the temp table name to the name given as parameter, if not blank
    if length (target_table) > 0 then
        execute format ('alter table temp_table rename to %I', target_table);
    end if;
end;

$$ language plpgsql;
Ponton answered 25/5, 2017 at 23:11 Comment(1)
don't forget to change set schema 'data'; to whatever is the case for youPonton
I
2
## csv with header
$ psql -U$db_user -h$db_host -p$db_port -d DB_NAME \
  -c "\COPY TB_NAME FROM 'data_sample.csv' WITH (FORMAT CSV, header);"

## csv without header
$ psql -U$db_user -h$db_host -p$db_port -d DB_NAME \
  -c "\COPY TB_NAME FROM 'data_sample.csv' WITH (FORMAT CSV);"

## csv without header, specify column
$ psql -U$db_user -h$db_host -p$db_port -d DB_NAME \
  -c "\COPY TB_NAME(COL1,COL2) FROM 'data_sample.csv' WITH (FORMAT CSV);"

all columns in csv should be same as table (or same as specified column)


about COPY
https://www.postgresql.org/docs/9.2/sql-copy.html

Inlaid answered 13/3, 2022 at 21:41 Comment(2)
i'm getting the error: ERROR: extra data after last expected columnBosanquet
@Bosanquet the csv columns should be same as table columns, or specify columns in sql as: COPY TB_NAME(COL1,COL2)... if csv has extra column, you'd better delete them before import ( I'm not sure other solution )Inlaid
H
1

To copy from CSV file to PostgreSQL table with headers in CSV file using query:

  • First Add all the files in C:/temp folder

  • Then write the below scripts which accepts both null values as well as empty strings

     copy PUBLIC."TABLE_NAME" FROM 
       'C:\tmp\TABLE_NAME.CSV' 
       (format csv, null "NULL", DELIMITER ',', HEADER);
    
Histo answered 4/12, 2023 at 17:54 Comment(0)
P
-2

You can use d6tstack which creates the table for you and is faster than pd.to_sql() because it uses native DB import commands. It supports Postgres as well as MYSQL and MS SQL.

import pandas as pd
df = pd.read_csv('table.csv')
uri_psql = 'postgresql+psycopg2://usr:pwd@localhost/db'
d6tstack.utils.pd_to_psql(df, uri_psql, 'table')

It is also useful for importing multiple CSVs, solving data schema changes and/or preprocess with pandas (eg for dates) before writing to db, see further down in examples notebook

d6tstack.combine_csv.CombinerCSV(glob.glob('*.csv'), 
    apply_after_read=apply_fun).to_psql_combine(uri_psql, 'table')
Prostitution answered 17/12, 2018 at 4:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.