Basically you should prepare the data (including its structure) outside the database, with ready-made tools or using python, ruby or a language of your choice.
However, in the lack of such opportunities you can do much using plpgsql.
Creating a table with text columns
Files in csv format do not contain any information about column types, primary or foreign keys etc.
You can relatively easily create a table with text columns and copy data to it.
After that you should manually alter types of columns and add constraints.
create or replace function import_csv(csv_file text, table_name text)
returns void language plpgsql as $$
begin
create temp table import (line text) on commit drop;
execute format('copy import from %L', csv_file);
execute format('create table %I (%s);',
table_name, concat(replace(line, ',', ' text, '), ' text'))
from import limit 1;
execute format('copy %I from %L (format csv, header)', table_name, csv_file);
end $$;
Example data in the file c:\data\test.csv
:
id,a_text,a_date,a_timestamp,an_array
1,str 1,2016-08-01,2016-08-01 10:10:10,"{1,2}"
2,str 2,2016-08-02,2016-08-02 10:10:10,"{1,2,3}"
3,str 3,2016-08-03,2016-08-03 10:10:10,"{1,2,3,4}"
Import:
select import_csv('c:\data\test.csv', 'new_table');
select * from new_table;
id | a_text | a_date | a_timestamp | an_array
----+--------+------------+---------------------+-----------
1 | str 1 | 2016-08-01 | 2016-08-01 10:10:10 | {1,2}
2 | str 2 | 2016-08-02 | 2016-08-02 10:10:10 | {1,2,3}
3 | str 3 | 2016-08-03 | 2016-08-03 10:10:10 | {1,2,3,4}
(3 rows)
Large csv files
The above function imports data two times (to temporary and target tables).
For large files this may be a serious loss of time and unnecessary load on the server.
A solution would be splitting a csv file into two files, one with header and one with data.
Then the function should look like this:
create or replace function import_csv(header_file text, data_file text, table_name text)
returns void language plpgsql as $$
begin
create temp table import (line text) on commit drop;
execute format('copy import from %L', header_file);
execute format('create table %I (%s);',
table_name, concat(replace(line, ',', ' text, '), ' text'))
from import;
execute format('copy %I from %L (format csv)', table_name, data_file);
end $$;
Altering column types
You can try to automatically change the column types based on their content.
You can succeed if you are dealing with simple types and data in the file consistently retains a specific format. However, in general it is a complex task and functions listed below should be considered only as an example.
Determine a column type based on its content (edit the function to add desired conversions):
create or replace function column_type(val text)
returns text language sql as $$
select
case
when val ~ '^[\+-]{0,1}\d+$' then 'integer'
when val ~ '^[\+-]{0,1}\d*\.\d+$' then 'numeric'
when val ~ '^\d\d\d\d-\d\d-\d\d$' then 'date'
when val ~ '^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d$' then 'timestamp'
end
$$;
Alter column types using the above function:
create or replace function alter_column_types(table_name text)
returns void language plpgsql as $$
declare
rec record;
qry text;
begin
for rec in
execute format(
'select key, column_type(value) ctype
from (
select row_to_json(t) a_row
from %I t
limit 1
) s, json_each_text (a_row)',
table_name)
loop
if rec.ctype is not null then
qry:= format(
'%salter table %I alter %I type %s using %s::%s;',
qry, table_name, rec.key, rec.ctype, rec.key, rec.ctype);
end if;
end loop;
execute(qry);
end $$;
Use:
select alter_column_types('new_table');
\d new_table
Table "public.new_table"
Column | Type | Modifiers
-------------+-----------------------------+-----------
id | integer |
a_text | text |
a_date | date |
a_timestamp | timestamp without time zone |
an_array | text |
(well, proper recognition of array types is quite complicated)