Using sample_weights with fit_generator()
Asked Answered
H

1

8

Inside an autoregressive continuous problem, when the zeros take too much place, it is possible to treat the situation as a zero-inflated problem (i.e. ZIB). In other words, instead of working to fit f(x), we want to fit g(x)*f(x) where f(x) is the function we want to approximate, i.e. y, and g(x) is a function which output a value between 0 and 1 depending if a value is zero or non-zero.

Currently, I have two models. One model which gives me g(x) and another model which fits g(x)*f(x).

The first model gives me a set of weights. This is where I need your help. I can use the sample_weights arguments with model.fit(). As I work with tremendous amount of data, then I need to work with model.fit_generator(). However, fit_generator() does not have the argument sample_weights.

Is there a work around to work with sample_weights inside fit_generator()? Otherwise, how can I fit g(x)*f(x) knowing that I have already a trained model for g(x)?

Hyperplane answered 17/11, 2018 at 19:59 Comment(1)
Have you tried index slicing? I.e. if you have 23000 prices you can simply slice each 5th one of these using data[0:23000:5, :, :]. The returned array will have shape (4600, 45, 41)Pickerelweed
S
15

You can provide sample weights as the third element of the tuple returned by the generator. From Keras documentation on fit_generator:

generator: A generator or an instance of Sequence (keras.utils.Sequence) object in order to avoid duplicate data when using multiprocessing. The output of the generator must be either

  • a tuple (inputs, targets)
  • a tuple (inputs, targets, sample_weights).

Update: Here is a rough sketch of a generator that returns the input samples and targets as well as the sample weights obtained from model g(x):

def gen(args):
    while True:
        for i in range(num_batches):
            # get the i-th batch data
            inputs = ...
            targets = ...
            
            # get the sample weights
            weights = g.predict(inputs)
            
            yield inputs, targets, weights
            
            
model.fit_generator(gen(args), steps_per_epoch=num_batches, ...)
    
    
Septuple answered 29/11, 2018 at 13:16 Comment(11)
Can you build a little snippet to show me how it works?Hyperplane
@Hyperplane Do you want to know how to define a data generator for a Keras model or how to return sample weights in a generator?Septuple
@Hyperplane You can find an example of Sequence based generator here.Septuple
Are you up to explain both?Hyperplane
@Hyperplane Sure, I added an update. Please take a look. This tutorial might also help (It is using Sequence based generators, not Python generators).Septuple
Ok, thanks for your help! If you can upvote my question, I will give you this bountyHyperplane
@Septuple what about with evaluate_generator? I can't seem to get the same logic you use here to work.Macle
@Macle I have not tried that, but according to the documentation it must be the same, i.e. you can return the sample weights as the third element of the tuple from the generator.Septuple
@Septuple Aha! The third - not the fourth? I'm sorry, I read the docs that you linked but I still don't understand because the docs say that it outputs a list of scalars, but doesn't say what each value at each position must represent....I think I'm making this too hard?Macle
@Macle That's a different question. They are loss value and the metric values returned for each output (if the model has multiple outputs). This answer might help you to understand it better.Septuple
I have a similar problem where I need the sample_weights from the generator in my custom loss function...this post helped me #57999725Cuirass

© 2022 - 2024 — McMap. All rights reserved.