Keras LSTM for Text Generation keeps repeating a line or a sequence
Asked Answered
D

2

16

I roughly followed this tutorial:

https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/

A notable difference is that I use 2 LSTM layers with dropout. My data set is different (music data-set in abc notation). I do get some songs generated, but after a certain number of steps (may range from 30 steps to a couple hundred) in the generation process, the LSTM keeps generating the exact same sequence over and over again. For example, it once got stuck with generating URLs for songs:

F: http://www.youtube.com/watch?v=JPtqU6pipQI

and so on ...

It also once got stuck with generating the same two songs (the two songs are a sequence of about 300 characters). In the beginning it generated 3-4 good pieces but afterwards, it kept regenerating the two songs almost indefinitely.

I am wondering, does anyone have some insight into what could be happening ?

I want to clarify that any sequence generated whether repeating or non-repeating seems to be new (model is not memorising). The validation loss and training loss decrease as expected. Andrej Karpathy is able to generate a document of thousands of characters and I couldn't find this pattern of getting stuck indefinitely.

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Devil answered 5/11, 2017 at 19:32 Comment(7)
Try using a stateful mode in order to connect consecutive generations.Irrevocable
hmm I was trying to avoid that, but I'll try it. Thanks for the suggestion :)Devil
I found that increasing the sample length (sub-batch) that make up my long sequences made a big difference, without having to use stateful. A simple fix that might be worth trying.Dulia
@Dulia could you say more about your work and this observation? What was the application domain, and how long were your input sequences before and then afterwards? Was your model generating the same outputs before you increased the "look back" size of your inputs? I'm curious because I'm facing the same problem now. @MarcinMożejko could you say more about why setting stateful to True helps prevent the model from memorizing the inputs and cycling back through seen values?Turd
@Turd I haven't touched the project in a while but here is what I can tell you from memory. The domain was generation of up to 16 parameters at each step of the timeseries (to feed a vocoder for speech generation). That said, I think I did eventually manage to debug stateful use, but it didn't really help me. The project description is here if you are interested: babble-rnn.consected.comDulia
Thanks @Phil. I can say that I've since also found that increasing the sample length also helped my model break from the same outputs considerablyTurd
Hey what does sample length means? I have the sequence length of 100. Anything should I change? I am facing the same issue as you do! I am getting same repetitive content as you do. Any help what I can do ?Gleason
A
4

Instead of taking the argmax on the prediction output, try introducing some randomness with something like this:

np.argmax(prediction_output)[0])

to

np.random.choice(len(prediction_output), p=prediction_output)

I've been struggling on this repeating sequences issue for a while until I discovered this Colab notebook where I figured out why their model was able to generate some really good samples: https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/shakespeare_with_tpu_and_keras.ipynb#scrollTo=tU7M-EGGxR3E

After I changed this single line, my model went from generating a few words over and over to something actually interesting!

Angkor answered 12/8, 2019 at 14:1 Comment(0)
P
4

To use and train a text generation model follow these steps:

  1. Drawing from the model a probability distribution over the next character given the text available so far ( This would be our predictions scores )
  2. Reweighting the distribution to a certain "temperature" (See the code below)
  3. Sampling the next character at random according to the reweighted distribution (See the code below)
  4. Adding the new character at the end of the available text

See the sample function:

def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

You should use the sample function during training as follows:

for epoch in range(1, 60):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y,
              batch_size=128,
              epochs=1)

    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)

        # We generate 400 characters
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

A low temperature results in extremely repetitive and predictable text, but where local structure is highly realistic: in particular, all words (a word being a local pattern of characters) are real English words. With higher temperatures, the generated text becomes more interesting, surprising, even creative.

See this notebook

Polyanthus answered 12/8, 2019 at 14:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.