I am using the tf_Agents library for contextual bandits usecase.
In this usecase predictions (daily range between 20k and 30k predictions, 1 for each user) are made daily (multiple times a day) and training only happens on all the predicted data from 4 days ago (Since the labels for predictions takes 3 days to observe).
The driver seems to replay only the batch_size number of experience (Since max_step length is 1 for contextual bandits). Also the replay buffer has the same constraint only handling batch size number of experiences.
I wanted to use checkpointer and save all the predictions (experience from driver which are saved in replay buffer) from the past 4 days and train only on the first of the 4 days saved on each given day.
I am unsure how to do the following and any help is greatly appreciate.
- How to (run the driver) save replay buffer using checkpoints for the entire day (a day contains, say, 3 predictions runs and each prediction will be made on 30,000 observations [say batch size of 16]). So in this case I need multiple saves for each day
- How to save the replay buffers for past 4 days (12 prediction runs ) and only retrieve the first 3 prediction runs (replay buffer and the driver run) to train for each day.
- Unsure how to handle the driver, replay buffer and checkpointer configurations given the above #1, #2 above