Is there something like Glue "Bookmark" feature in spark which keeps track at job level?
Asked Answered
K

1

7

I am looking to see if there is something like AWS Glue "bookmark" in spark. I know there is checkpoint in spark which works well on individual data source. In Glue we could use bookmark to keep track of all the files across different tables involved in the job using single bookmark.

Kaolin answered 14/9, 2021 at 6:59 Comment(0)
A
0

You can use Spark Structured Streaming in combination with Trigger.Once() for that.

The stream will essentially just run one micro stream batch, which is the same as a single batch, while leveraging the checkpointing capability which keeps track of the processed files

Americana answered 16/11, 2022 at 13:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.