Unable to access databricks cluster with databricks-connect "V2" V.13.2
Asked Answered
C

2

6

When trying to execute local spark code with databricks-connect 13.2.0, it does not work.

I have the following issue:

Error:

  • details = "INVALID_STATE: cluster xxxxx is not Shared or Single User Cluster. (requestId=05bc3105-4828-46d4-a381-7580f3b55416)"
  • debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"INVALID_STATE: cluster 0711-122239-bb999j6u is not Shared or Single User Cluster. (requestId=05bc3105-4828-46d4-a381-7580f3b55416)", grpc_status:9, created_time:"2023-07-11T15:26:08.9729+02:00"}"

The cluster is Shared and I tried several cluster configurations but it doest not work ! The cluster runtime version is 13.2.

Also, I use:

  • Python 3.10
  • openjdk version "1.8.0_292"
  • Azure Databricks

Any one had a similar issue with new databricks connect ?

Thanks for help!

I tried the following code:

from databricks.connect import DatabricksSession
from pyspark.sql.types import *

from delta.tables import DeltaTable
from datetime import date


if __name__ == "__main__":
    spark = DatabricksSession.builder.getOrCreate()

    # Create a Spark DataFrame consisting of high and low temperatures
    # by airport code and date.
    schema = StructType([
        StructField('AirportCode', StringType(), False),
        StructField('Date', DateType(), False),
        StructField('TempHighF', IntegerType(), False),
        StructField('TempLowF', IntegerType(), False)
    ])

    data = [
        [ 'BLI', date(2021, 4, 3), 52, 43],
        [ 'BLI', date(2021, 4, 2), 50, 38],
        [ 'BLI', date(2021, 4, 1), 52, 41],
        [ 'PDX', date(2021, 4, 3), 64, 45],
        [ 'PDX', date(2021, 4, 2), 61, 41],
        [ 'PDX', date(2021, 4, 1), 66, 39],
        [ 'SEA', date(2021, 4, 3), 57, 43],
        [ 'SEA', date(2021, 4, 2), 54, 39],
        [ 'SEA', date(2021, 4, 1), 56, 41]
    ]

    temps = spark.createDataFrame(data, schema)

    print(temps)

And I expect to display the dataframe in may local terminal with remote spark execution

Cell answered 11/7, 2023 at 13:46 Comment(1)
Did you ever figure out a solution for this? Currently experiencing the same issueHunnish
A
7

Databricks Connect V2 requires cluster supporting Unity Catalog - it explicitly stated in the requirements. And looks like you're using Data Access mode "No Isolation Shared" or you don't have Unity Catalog at all. If you have Unity Catalog, make sure that you have selected Single User or Shared in "Access mode" dropdown.

enter image description here

Aude answered 11/7, 2023 at 14:42 Comment(7)
So no way using databricks-connect if the cluster (13.3 LTE) is not yet supporting Unity Catalog?Intrude
DBConnect V2 is supporting 13.3, but it should be Unity Catalog-enabled cluster (single user or shared), not "Shared No Isolation"Aude
I understand. But in my company Unity Support is not yet enabled. Still I want to use 13.3 LTS and get above error. Is there a workaround until we have Unity? Thanks a lot.Intrude
No, it’s not possible to use it without Unity Catalog… No workaroundAude
To share, my cluster (runtime 13.3) uses hive metastore without Unity catalog. Databricks-connect doesn't work with Shared mode (it complains about permission issue) but it works with Single UserIntermix
@AlexOtt any information as to why this only works with a unity catalog cluster? This was no requirement before and to me there is no obvious technical requirement for this. This makes using the VSCode extension + databricks connect much more annoying.Miley
Is there any way to run Databricks Connect on PyCharm without Unity Catalog enabled? It's not really an option for me at a moment.Suckow
U
0

Make sure that when you create a cluster in the summary you see the Unity Catalog tag: enter image description here

And as answered by Alex Ott, either select Single User or Shared in the Dropdown Access Mode.

Ulrikeulster answered 17/11, 2023 at 10:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.