Is it possible to load parquet file directly into a snowflake? If yes - how?
Thanks.
Is it possible to load parquet file directly into a snowflake? If yes - how?
Thanks.
Yes it is possible, and best done via S3. Note, the following assumes you have a MY_PARQUET_LOADER table, a STAGE_SCHEMA schema and an S3STAGE defined, and that your parquet files are on the bucket under the /path/ key/folder.
copy into STAGE_SCHEMA.MY_PARQUET_LOADER
from (
select
$1
,metadata$filename as metadata_filename
,metadata$file_row_number as metadata_file_row_number
,current_timestamp() as load_timestamp
from
@S3STAGE/path/)
pattern = '.*.parquet'
file_format = (
TYPE = 'PARQUET'
SNAPPY_COMPRESSION = TRUE )
ON_ERROR = 'SKIP_FILE_1%'
purge= TRUE;
where this exists:
create or replace TABLE MY_PARQUET_LOADER (
RAW VARIANT,
METADATA_FILENAME VARCHAR(16777216),
METADATA_FILE_ROW_NUMBER NUMBER(38,0),
LOAD_TIMESTAMP TIMESTAMP_LTZ(9)
) cluster by (METADATA_FILENAME);
Worthwhile to read the fine manual:
https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-table.html
I use these 2 sql to create and load the data into table, this may make your life easier
first: use this SQL to create the table sql
with cols as (
select COLUMN_NAME || ' ' || TYPE col
from table(
infer_schema(
location=>'@LANDING/myFile.parquet'
, file_format=>'LANDING.default_parquet'
)
)
),
temp as (
select 'create or replace table myTable (' col1
union
select listagg(col, ',') col1
from cols
union
select ') ' col1
)
select listagg(col1)
from temp
Second, use this SQL to create the copy into SQL load the data into table
with cols as (
select expression
from table(
infer_schema(
location=>'@LANDING/myFile.parquet'
, file_format=>'LANDING.default_parquet'
)
)
),
temp as (
select 'copy into myTable from ( select ' col1
union
select listagg(expression, ',') col1
from cols
union
select 'from @LANDING/myFile.parquet ) ' col1
)
select listagg(col1)
from temp
yes this could be a solution
create or replace TABLE MY_PARQUET_LOADER (
RAW VARIANT,
METADATA_FILENAME VARCHAR(16777216),
METADATA_FILE_ROW_NUMBER NUMBER(38,0),
LOAD_TIMESTAMP TIMESTAMP_LTZ(9)
) cluster by (METADATA_FILENAME);
Create a file format for parquet, USE_LOGICAL_TYPE = TRUE will make sure that time saved as INT (Epoch) will be translated to datetime.
CREATE OR REPLACE FILE FORMAT parquet_ff
TYPE = parquet
USE_LOGICAL_TYPE = TRUE;
Create table on Snowflake where you Infer the schema from parquet file on stage.
CREATE OR REPLACE TABLE myTable USING TEMPLATE (
SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
WITHIN GROUP (ORDER BY ORDER_ID)
FROM TABLE (
INFER_SCHEMA(
LOCATION=>'@stage/path/to/parquet'
, FILE_FORMAT=>'parquet_ff'
)));
Copy into the newly created table from parquet file with the option MATCH_BY_COLUMN_NAME = 'CASE_INSENSITIVE'
to match it by column names. Check other possible options here.
copy into myTable
from @stage/path/to/parquet
FILE_FORMAT = parquet_ff
MATCH_BY_COLUMN_NAME = 'CASE_INSENSITIVE'
;
© 2022 - 2025 — McMap. All rights reserved.