Trigger.AvailableNow for Delta source streaming queries in PySpark (Databricks)

Asked 10/2, 2022 at 8:20 Answered 21/7, 2022 at 20:39

Solved pyspark databricks spark-structured-streaming delta-lake

All the examples in the Databricks documentation are in Scala. Can't find how to use this trigger type from PySpark. Is there an equivalent API or workaround ?

Consignee answered 10/2, 2022 at 8:20 Comment(0)

Python implementation missed the Spark 3.2 release, so it will be included into Spark 3.3 only (for OSS version). On Databricks it was released as part of DBR 10.3 (or 10.2?), and could be used as following:

.trigger(availableNow=True)

Thurston answered 10/2, 2022 at 12:26 Comment(0)

Here is the official documentation:

DataStreamWriter.trigger(*, processingTime: Optional[str] = None, 
                            once: Optional[bool] = None, 
                            continuous: Optional[str] = None, 
                            availableNow: Optional[bool] = None) -> pyspark.sql.streaming.DataStreamWriter

availableNow: bool, optional

if set to True, set a trigger that processes all available data in multiple >batches then terminates the query. Only one trigger can be set.

# trigger the query for reading all available data with multiple batches
writer = sdf.writeStream.trigger(availableNow=True)

Derzon answered 21/7, 2022 at 20:39 Comment(0)

Recommended topics

Hot tags