How to decompress a zip file in Azure Data Factory v2
Asked Answered
O

3

11

I'm trying to decompress a zip file (with multiple files inside) using Azure Data Factory v2. The zip file is located in Azure File Storage. The ADF Copy task just copies the original zip file without decompressing it. Any suggestion on how to make this work?

This is the current configuration:

  1. The zip file source was setup as a binary dataset with Compression Type = ZipDeflate.
  2. The target folder was also setup as a binary dataset but with Compression Type = None.
  3. A pipeline with a single Copy task was created to move files from zip file to target folder.
Overspill answered 29/7, 2019 at 20:34 Comment(1)
Make sure you check Recursively on the sourceTautologize
S
4

This can be achieved by having a setting "ZipDeflate" compression type in your source data set and in the sink data set of Copy activity you don't need to specify any compression configuration (Compression type is "none").

enter image description here

In the Copy activity sink settings, please set the copy behavior to "Flatten Hierarchy" to unzip and write the individual files.

enter image description here

When the Copy behavior is set to "Flatten Hierarchy", all the files from zipped source file are extracted and written to destination folder mentioned in the sink dataset as individual files by renaming the files to data_SomeGUID.csv.

In case if you do not specify the copy behavior (set to "none") in copy activity, then it decompress ZipDeflate file(s) and write to file-based sink data store, files will be extracted to the folder: //.

Please refer to this doc to know about the Compression support in Azure data factory: https://learn.microsoft.com/azure/data-factory/supported-file-formats-and-compression-codecs-legacy#compression-support

Sande answered 28/2, 2020 at 23:29 Comment(0)
T
0

If you don't want to lose the names of the files within your zip, use the Copy activity but set the Copy Behavior to "Preserve hierarchy". This will create a folder with the name of your zip file, and the files will be inside with their original names.

Zip Copy Behavior

Transvestite answered 15/1, 2021 at 1:26 Comment(0)
T
0

If your reequipment is to Unzip and move only the files (not the folders) to the target location, you can follow the below steps. These steps will preserve the files names.

  1. In the Source Dataset mention the compression type as "ZipDeflate" and the Compression level.

enter image description here

  1. In the Copy Activity source settings, mention the Wildcard path and the file name. Uncheck 'Preserve zip file name as folder'

enter image description here

  1. In the Copy activity sink setting, no need to mention the Copy behavior enter image description here

You will see the files are successfully unzipped and loaded to the target location.

Tamboura answered 29/8 at 19:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.