Create External table in Azure databricks - McMap

About

Create External table in Azure databricks

Asked 27/6, 2019 at 13:26 Answered 2/7, 2019 at 13:34

azure hive databricks azure-databricks external-tables

B

1

7

I am new to azure databricks and trying to create an external table, pointing to Azure Data Lake Storage (ADLS) Gen-2 location.

From databricks notebook i have tried to set the spark configuration for ADLS access. Still i am unable to execute the DDL created.

Note: One solution working for me is mounting the ADLS account to cluster and then use the mount location in external table's DDL. But i needed to check if it is possible to create a external table DDL with ADLS path without mount location.

# Using Principal credentials
spark.conf.set("dfs.azure.account.auth.type", "OAuth")
spark.conf.set("dfs.azure.account.oauth.provider.type", "ClientCredential")
spark.conf.set("dfs.azure.account.oauth2.client.id", "client_id")
spark.conf.set("dfs.azure.account.oauth2.client.secret", "client_secret")
spark.conf.set("dfs.azure.account.oauth2.client.endpoint", 
"https://login.microsoftonline.com/tenant_id/oauth2/token")

DDL

create external table test(
id string,
name string
)
partitioned by (pt_batch_id bigint, pt_file_id integer)
STORED as parquet
location 'abfss://container@account_name.dfs.core.windows.net/dev/data/employee

Error Received

Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.contracts.exceptions.ConfigurationPropertyNotFoundException Configuration property account_name.dfs.core.windows.net not found.);

I need help in knowing if this is possible to refer to ADLS location directly in DDL?

Thanks.

Batton answered 27/6, 2019 at 13:26 Comment(4)

have you verified all your values for tenant_id, client_id and client_secret are correct and the service principle has the required permissions? – Loraineloralee 27/6, 2019 at 14:9

Yes, because after setting up the spark configurations i am able to read the file in a data frame and use it. – Batton 27/6, 2019 at 14:51

looks like only way is to mount the ADLS Gen2 account. – Batton 27/6, 2019 at 20:39

Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.contracts.exceptions.ConfigurationPropertyNotFoundException Configuration property account_name.dfs.core.windows.net not found.); – Batton 8/7, 2019 at 11:38

B

4

Sort of if you can use Python (or Scala).

Start by making the connection:

TenantID = "blah"

def connectLake():
  spark.conf.set("fs.azure.account.auth.type", "OAuth")
  spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
  spark.conf.set("fs.azure.account.oauth2.client.id", dbutils.secrets.get(scope = "LIQUIX", key = "lake-sp"))
  spark.conf.set("fs.azure.account.oauth2.client.secret", dbutils.secrets.get(scope = "LIQUIX", key = "lake-key"))
  spark.conf.set("fs.azure.account.oauth2.client.endpoint", "https://login.microsoftonline.com/"+TenantID+"/oauth2/token")

connectLake()
lakePath = "abfss://[email protected]/"

Using Python you can register a table using:

spark.sql("CREATE TABLE DimDate USING PARQUET LOCATION '"+lakePath+"/PRESENTED/DIMDATE/V1'")

You can now query that table if you have executed the connectLake() function - which is fine in your current session/notebook.

The problem is now if a new session comes in and they try select * from that table it will fail unless they run the connectLake() function first. There is no way around that limitation as you have to prove credentials to access the lake.

You may want to consider ADLS Gen2 credential pass through: https://docs.azuredatabricks.net/spark/latest/data-sources/azure/adls-passthrough.html

Note that this requires using a High Concurrency cluster.

Bogbean answered 2/7, 2019 at 13:34 Comment(5)

Does not work for ADLS gen2. This method is working for Gen1 storage accounts. – Batton 8/7, 2019 at 9:4

I tested the above on gen2, have you? If so please describe the error. – Bogbean 8/7, 2019 at 10:23

Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.contracts.exceptions.ConfigurationPropertyNotFoundException Configuration property account_name.dfs.core.windows.net not found.); – Batton 8/7, 2019 at 15:9

That session has not run the connectLake() function. As explained every session needs to run it. – Bogbean 8/7, 2019 at 15:27

do we need to run connectLake() function even if we run the ddl in same spark session? – Batton 11/7, 2019 at 10:14

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.