How to load parquet file into Snowflake database?

Asked 6/7, 2018 at 17:26 Answered 14/2, 2024 at 11:25

database parquet snowflake-cloud-data-platform

Is it possible to load parquet file directly into a snowflake? If yes - how?

Thanks.

Darya answered 6/7, 2018 at 17:26 Comment(2)

According to the Snowflake documentation, yes. – Magnusson 6/7, 2018 at 17:31

snowflake.com/blog/… – Champion 28/12, 2021 at 16:26

Yes it is possible, and best done via S3. Note, the following assumes you have a MY_PARQUET_LOADER table, a STAGE_SCHEMA schema and an S3STAGE defined, and that your parquet files are on the bucket under the /path/ key/folder.

  copy into STAGE_SCHEMA.MY_PARQUET_LOADER
     from ( 
       select
       $1
     ,metadata$filename as metadata_filename
     ,metadata$file_row_number as metadata_file_row_number  
     ,current_timestamp() as load_timestamp
       from
  @S3STAGE/path/)
   pattern = '.*.parquet'
  file_format = (
  TYPE = 'PARQUET' 
  SNAPPY_COMPRESSION = TRUE    )
  ON_ERROR = 'SKIP_FILE_1%'
  purge= TRUE;

where this exists:

create or replace TABLE MY_PARQUET_LOADER (
    RAW VARIANT,
    METADATA_FILENAME VARCHAR(16777216),
    METADATA_FILE_ROW_NUMBER NUMBER(38,0),
    LOAD_TIMESTAMP TIMESTAMP_LTZ(9)
) cluster by (METADATA_FILENAME);

Worthwhile to read the fine manual:
https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-table.html

Wondering answered 9/7, 2018 at 13:33 Comment(0)

I use these 2 sql to create and load the data into table, this may make your life easier

first: use this SQL to create the table sql

with cols as (
select COLUMN_NAME || '   ' || TYPE  col
  from table(
    infer_schema(
      location=>'@LANDING/myFile.parquet'
      , file_format=>'LANDING.default_parquet'
      )
  ) 
),
temp as (
    select 'create or replace table myTable (' col1
    union
    select listagg(col, ',') col1
    from cols
    union
    select ') ' col1
)
select listagg(col1) 
from temp

Second, use this SQL to create the copy into SQL load the data into table

with cols as (
select expression 
  from table(
    infer_schema(
      location=>'@LANDING/myFile.parquet'
      , file_format=>'LANDING.default_parquet'
      )
  )
),
temp as (
    select 'copy into myTable from ( select ' col1
    union
    select listagg(expression, ',') col1
    from cols
    union
    select 'from @LANDING/myFile.parquet ) ' col1
)
select listagg(col1) 
from temp

Melvin answered 13/9, 2022 at 9:59 Comment(0)

yes this could be a solution

create or replace TABLE MY_PARQUET_LOADER (
    RAW VARIANT,
    METADATA_FILENAME VARCHAR(16777216),
    METADATA_FILE_ROW_NUMBER NUMBER(38,0),
    LOAD_TIMESTAMP TIMESTAMP_LTZ(9)
) cluster by (METADATA_FILENAME);

Reitman answered 30/1, 2021 at 14:20 Comment(0)

Create a file format for parquet, USE_LOGICAL_TYPE = TRUE will make sure that time saved as INT (Epoch) will be translated to datetime.

CREATE OR REPLACE FILE FORMAT parquet_ff
  TYPE = parquet
  USE_LOGICAL_TYPE = TRUE;

Create table on Snowflake where you Infer the schema from parquet file on stage.

CREATE OR REPLACE TABLE myTable USING TEMPLATE (
SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*)) 
WITHIN GROUP (ORDER BY ORDER_ID)
FROM TABLE (
    INFER_SCHEMA(
      LOCATION=>'@stage/path/to/parquet'
      , FILE_FORMAT=>'parquet_ff'
      )));

Copy into the newly created table from parquet file with the option MATCH_BY_COLUMN_NAME = 'CASE_INSENSITIVE' to match it by column names. Check other possible options here.

copy into myTable
from @stage/path/to/parquet 
FILE_FORMAT  = parquet_ff
MATCH_BY_COLUMN_NAME = 'CASE_INSENSITIVE'
;

Bespeak answered 14/2, 2024 at 11:25 Comment(0)

Recommended topics

Hot tags