What is the difference between ABFSS and WASBS in Azure storage?
Asked Answered
P

3

40

There are definitions available for what is ABFS[S] and WASB[S]. But no clear demarcation of when to use what. What are the suitable and most appropriate use cases for both?

Prevocalic answered 18/2, 2020 at 9:25 Comment(0)
R
37

1) Blob Storage with HTTP

Azure introduced blob storage which is an object storage with flat structure. No concept of folders or hierarchy. Although the use of slash(/) in file name gives the illusion of hierarchy.

blob endpoint (blob.core.windows.net) with HTTP protocol can be used to read and write blobs

https://storageaccount.blob.core.windows.net/container/path/to/blob

2) Blob Storage with WASBS

If Hadoop applications wanted to interact with azure blob storage, then HDFS compatibility was provided using the WASBS driver. This driver performed the complex task of mapping file system semantics (as required by the Hadoop Filesystem interface) to that of the object store style interface exposed by Azure Blob Storage.

wasbs://[email protected]

With WASB driver, tools like HDInsight using the driver can connect to blob storage on the same blob endpoint (blob.core.windows.net).

3) ADLS with ABFSS

(Ignore ADLS gen 1 which is a separate service and is now deprecated)

check this answer for diff b/w blob storage and ADLS

Then came ADLS Gen2 (Azure's HDFS offering) which supports hierarchical storage (concept of folders) with features like ACL on the files and folders. Storage accounts with hierarchical namespace feature enabled is converted from blob storage to ADLS Gen2. In order to talk to ADLS gen2, DFS endpoint (dfs.core.windows.net) is used.

abfss://[email protected]

Hadoop applications can now use ABFS driver to connect to ADLS. Because of the new DFS endpoints, the driver is now very efficient and there is no requirement for a complex mapping in the driver. Solutions like Horton works, HDInsight, azure Databricks can connect to ADLS far more efficiently using the ABFSS driver.

Also, you will notice some of the tools like powerBI supports both WASBS and ABFSS.


What to use?

If ADLS is used,

  • In case of Hadoop / Data processing tools like Databricks, HD Insight will have to use ABFSS on DFS endpoint.
  • ADLS HTTP rest endpoint docs. To make HTTP calls if needed. Eg: A python app trying to list the paths. etc.
  • ADLS is built on top of blob storage hence the blob endpoint can also be used to read and write the data.

If Blob storage is used,

  • In case of Hadoop / Data processing tools, WASBS on blob endpoint can be used. (WASB will be deprecated in the future)
  • ABFS Driver is also cross compatible, and this driver can also be used.
  • Other use cases can simply use HTTP endpoints without needing any special drivers. Eg: A python app reading and writing files to blob storage using http endpoint.

  • ADLS - Azure Data Lake Storage
  • WASB - Windows Azure Storage Blob (provides unencrypted access)
  • WASBS - Windows Azure Storage Blob Secure (TLS encrypted access)
  • ABFS - Azure blob file system
  • ABFSS - Azure blob file system secure
  • DFS - Distributed file system

Update 1:

Microsoft has deprecated the Windows Azure Storage Blob driver (WASB) in favor of the Azure Blob Filesystem driver (ABFS). ABFS has numerous benefits over WASB. Use ABFS for both Blob Storage and Data Lake for newer workloads.

Recuperate answered 7/4, 2022 at 11:0 Comment(0)
P
30

The difference and use case are as below:

ABFS[S] is used for Azure Data Lake Storage Gen2 which is based on normal Azure storage(during creating Azure storage account, enable Hierarchical namespace, then you create a Azure Data Lake Storage Gen2). An example is here.

WASB[S] is used for the normal Azure storage. An example is here.

Pacify answered 26/2, 2020 at 9:27 Comment(2)
I have a question regarding using Azure Key Vault posted here. I wans wondering if you will have time to share your thoughts there.Glutinous
The question is about when to use what. Your answer doesn't give convey anything of substance, that couldn't already be found by googling abfs or wasb.Morten
Z
13

ABFS stands for Azure Blob File System and Microsoft recommends it for big data workloads as it is optimized for it as mentioned here.

WASBS stands for Windows Azure Storage Blob and Microsoft recommends it as is provides TLS encrypted access as mentioned here.

Zina answered 6/1, 2021 at 12:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.