To run Delta locally with PySpark, you need to follow the official documentation.
This works for me but only when executing directly the script (python <script_file>), not with pytest or unittest.
To solve this problem, you need to add this environment variable:
PYSPARK_SUBMIT_ARGS='--packages io.delta:delta-core_2.12:1.0.0 pyspark-shell'
Use Scala and Delta version that match your case. With this environment variable, I can run pytest or unittest via cli without any problem
from unittest import TestCase
from delta import configure_spark_with_delta_pip
from pyspark.sql import SparkSession
class TestClass(TestCase):
builder = SparkSession.builder.appName("MyApp") \
.master("local[*]")
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
spark = configure_spark_with_delta_pip(builder).getOrCreate()
def test_create_delta_table(self):
self.spark.sql("""CREATE IF NOT EXISTS TABLE <tableName> (
<field1> <type1>)
USING DELTA""")
The function configure_spark_with_delta_pip appends a config option in builder object
.config("io.delta:delta-core_<scala_version>:<delta_version>")