Error: Invalid configuration value detected for fs.azure.account.key
Asked Answered
L

5

15

I am using Azure Databricks to make a delta table in Azure Blob Storage using ADLS Gen2 but i am getting the error "Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key" on last line

%scala
spark.conf.set(
    "fs.azure.account.oauth2.client.secret",
    "<storage-account-access-key>")
friends = spark.read.csv('myfile/fakefriends-header.csv',
   inferSchema = True, header = True)
friends.write.format("delta").mode('overwrite')\
   .save("abfss://[email protected]/myfile/friends_new")

Please help me out how can i avoid this error

Lith answered 3/11, 2021 at 13:13 Comment(0)
C
10

Short answer - you can't use storage account access key to access data using the abfss protocol. You need to provide more configuration options if you want to use abfss - it's all described in documentation.

spark.conf.set(
  "fs.azure.account.auth.type.<storage-account-name>.dfs.core.windows.net", 
  "OAuth")
spark.conf.set(
  "fs.azure.account.oauth.provider.type.<storage-account-name>.dfs.core.windows.net", 
  "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(
  "fs.azure.account.oauth2.client.id.<storage-account-name>.dfs.core.windows.net", 
  "<application-id>")
spark.conf.set(
  "fs.azure.account.oauth2.client.secret.<storage-account-name>.dfs.core.windows.net", 
  dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"))
spark.conf.set(
  "fs.azure.account.oauth2.client.endpoint.<storage-account-name>.dfs.core.windows.net", 
  "https://login.microsoftonline.com/<directory-id>/oauth2/token")

Storage access key could be used only when you're using wasbs, but it's not recommended to do with ADLSGen2.

P.S. You can also use passthrough cluster if you have permissions to access that storage account.

Chloroplast answered 3/11, 2021 at 15:17 Comment(7)
it was not clear to me from the linked official documentation, that storage access key could be used only when you're using wasbs; how did you derive the info? I know wasbs is not recommended to be used any more.Cosmonaut
i assume above "client.id." requires register an app in azure?Cosmonaut
and i found this section does not really work learn.microsoft.com/en-us/azure/databricks/data/data-sources/…Cosmonaut
it works just fine... using really regularlyChloroplast
I meant section: If you have properly configured credentials to access your Azure storage container, you can interact with resources in the storage account using URIs. Databricks recommends using the abfss driver for greater security.. How the security should be configured? The documentation does not say in that section. I know other sections talks about SAS and Oauth, like your answer about Oauth with app registration.Cosmonaut
Security here means that ABFSS driver uses TLS 1.2 by default, plus it's relying on the short lived OAuth tokens compared to storage key or SASChloroplast
I think the problem with the linked documentation is that it lacks example for each type of auth since there are combinations of methods and protocol that do not work but this is not explictly stated in the docs e.g. "Storage access key could be used only when you're using wasbs," is not in that page.Parallelepiped
Y
6

a few months later but try with the following code in your notebook

spark._jsc.hadoopConfiguration().set("fs.azure.account.key.<account name>.dfs.core.windows.net",'<account key>')
Yarn answered 21/3, 2023 at 14:40 Comment(0)
U
2

This error can also happen if storage account name is mistyped (my case). i.e. check one set in

spark.conf.set(s"fs.azure.account.oauth.provider.type.$<<storageAccountName>>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")

is the same as you use in the select * parquet.``abfs://...@<<storageAccountName>>... statement or other Spark action.

Unbeknown answered 30/3, 2023 at 18:30 Comment(0)
P
1

I had the same issue when trying to use Active Directory authentication with a service principal. I found a couple of things that were causing it:

  1. The Databricks secrets "{{...}}" notation, i.e. spark.conf.set("secret", "{{secrets/scope/key}}") was not correctly resolving the secret. At the time of this writing, this feature is in public preview. I had to use the dbutils instead.
  2. I was using the sc.binaryFiles() RDD method to read some files. Switching to spark.conf.read.format("binaryFile").load("/path/to/files/*") fixed it.

Here is a complete working example:

from pyspark.dbutils import *

# Fill in your values here
storage_account_name = "mystorage"
container_name = "mtcontainer"
active_directory_client_id = "..." 
active_directory_tenant_id = "..."
service_credential = dbutils.secrets.get("scope-name", "secret-name")

# Configure for active directory authentication
spark.conf.set(f"fs.azure.account.auth.type.{storage_account_name}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storage_account_name}.dfs.core.windows.net", 
               "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storage_account_name}.dfs.core.windows.net", active_directory_client_id)
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storage_account_name}.dfs.core.windows.net", 
               service_credential)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storage_account_name}.dfs.core.windows.net", 
               f"https://login.microsoftonline.com/{active_directory_tenant_id}/oauth2/token")

path_to_files = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/path/to/files"

# Use a file count as a connection test
spark.read.format("binaryFile").load(path_to_files).count()
Periodical answered 29/9, 2023 at 21:28 Comment(0)
G
0

Better us a Service Principal.
More readable imo:

import json
import dbutils

SP_file = json.loads(dbutils.secrets.get("yourScope", "yourServicePrincipal"))

service_principal_id = SP_file["CLIENT_ID"]
service_principal_secret = SP_file["CLIENT_SECRET"]
tenant = SP_file["TENANT_ID"]

configs = {
    'fs.azure.account.auth.type': 'OAuth',
    'fs.azure.account.oauth.provider.type': 'org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider',
    'fs.azure.account.oauth2.client.id': service_principal_id,
    'fs.azure.account.oauth2.client.secret': service_principal_secret,
    'fs.azure.account.oauth2.client.endpoint': f'https://login.microsoftonline.com/{tenant}/oauth2/token'
  }

for key, value in configs.items():
    spark.conf.set(key, value)

your_df\
    .write.format("delta")\
    .mode("append")\
    .save("abfss://[email protected]/Your_Folder/")
Gibraltar answered 4/7 at 10:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.