Is there something like Glue "Bookmark" feature in spark which keeps track at job level?

About

Asked 14/9, 2021 at 6:59 Answered 16/11, 2022 at 13:2

apache-spark pyspark spark-streaming aws-glue incremental-load

I am looking to see if there is something like AWS Glue "bookmark" in spark. I know there is checkpoint in spark which works well on individual data source. In Glue we could use bookmark to keep track of all the files across different tables involved in the job using single bookmark.

Kaolin answered 14/9, 2021 at 6:59 Comment(0)

You can use Spark Structured Streaming in combination with Trigger.Once() for that.

The stream will essentially just run one micro stream batch, which is the same as a single batch, while leveraging the checkpointing capability which keeps track of the processed files

Americana answered 16/11, 2022 at 13:2 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags