Azure Data Lake storage Gen2 permissions
Asked Answered
S

1

5

I am currently building a data lake (Gen2) in Azure. I use Terraform to provision all the resources. However, I ran into some permission inconsistencies. According to the documentation, one can set permissions for the data lake with RBAC and ACLs.

My choice is to use ACLs since it allows for fine-grained permissions on directories within the data lake. In the data lake, I created a directory raw among other directories for which a certain group has r-- (read only) default permissions. The default means that all the objects under this directory are assigned the same permissions as the permissions on the directory. When users in that group are trying to access the data lake with Storage Explorer, they do not see a storage account and they do not see the actual filesystem/container in which the directory lives. So they are not able to access the directory for which they have read-only permissions.

So I was thinking of assigning the permissions needed to at least list storage accounts and filesystems (containers). Evaluating existing roles, I came to the following permissions:

  1. Microsoft.Storage/storageAccounts/listKeys/action
  2. Microsoft.Storage/storageAccounts/read

After applying permission 1, nothing changed. After applying permission 2 as well, users in the group could suddenly do everything in the data lake as if there was no ACL specified.

My question now is: how can I use ACLs (and RBAC) to create a data lake with directories with different permissions for different groups, so that groups are actually able to only read or write to those directories that are in the ACLs? In addition, they should be able to list storage accounts and filesystems (containers) for which they have access to certain directories.

Sweatt answered 16/12, 2020 at 9:35 Comment(1)
see my answer here, should do exactly what you need: https://mcmap.net/q/1194925/-azure-storage-restrict-access-one-container-onlySissy
M
8

I believe you also need to create access ACLs on the entire hierarchy of folders down to the file or folder you are trying to read, including the root container.

So if your folder "raw" was created in the top level then you'll need to create the following ACLs for that group...

"/"    --x (access)
"/raw" r-x (access)
"/raw" r-x (default)

... and the default ACL will then give the group the read and execute ACL on all sub folders and files created.

You also need to give the group at least Reader RBAC permission on the resource - this can either be on the storage account, on just on the container if you want to restrict access to other containers.

You can set the ACLs on container with the ace property of the azurerm_storage_data_lake_gen2_filesystem Terraform resource and then set the ACLs on the folders using the azurerm_storage_data_lake_gen2_path Terraform resource.

Here's an example where I'm storing the object_id of the Azure Active Directory in a variable named aad_group_object_id.

# create the data lake
resource "azurerm_storage_account" "data_lake" {
  ....
}

# create a container named "acltest" with execute ACL for the group
resource "azurerm_storage_data_lake_gen2_filesystem" "data_lake_acl_test" {
  name               = "acltest"
  storage_account_id = azurerm_storage_account.data_lake.id
  
  ace {
    type = "group"
    scope = "access"
    id = var.aad_group_object_id
    permissions = "--x"
  }
}

# create the folder "raw" and give read and execute access and default permissions to group
resource "azurerm_storage_data_lake_gen2_path" "folder_raw" {
  path               = "raw"
  filesystem_name    = azurerm_storage_data_lake_gen2_filesystem.data_lake_acl_test.name
  storage_account_id = azurerm_storage_account.data_lake.id
  resource           = "directory"
  ace {
    type = "group"
    scope = "access"
    id = var.aad_group_object_id
    permissions = "r-x"
  }
  ace {
    type = "group"
    scope = "default"
    id = var.aad_group_object_id
    permissions = "r-x"
  }
}

I've not included it in the code example, but you'll also have to add the ACLs for the owning group, owner, mask and other identities that get added to the root container and sub folders. Otherwise you'll keep seeing in your Terraform plan that it tries to drop and recreate them each time.

You can just added this - unfortunately you need to add it to every folder you create, unless anyone knows a way around this.

  ace {
    permissions = "---" 
    scope       = "access"
    type        = "other"
  }
  ace {
    permissions = "r-x"
    scope       = "access"
    type        = "group"
  }
  ace {
    permissions = "r-x"
    scope       = "access"
    type        = "mask"
  }
  ace {
    permissions = "rwx"
    scope       = "access"
    type        = "user"
  }
Mensurable answered 15/3, 2021 at 17:8 Comment(3)
Did you find any more info on why we need to add default permissions to each filesystem? I am trying to prevent my Terraform code from wanting to recreate each time.Mlawsky
@Mlawsky see my post below, I included links to a module I wrote that does prevent recreating the permissions. Although it is a bit out of scope here, it is very annoying.Sweatt
@Simon, I think I gave the group Reader permissions on the subscription only, which I think is causing the group not to see the storage account. This makes sense, because a storage accounts can contain sensitive information that you do not want to be readable by inheritance automatically. Thanks though for the detailed explanation.Sweatt

© 2022 - 2024 — McMap. All rights reserved.