I have trained a simple long short-term memory (lstm) model in lasagne following the recipie here:https://github.com/Lasagne/Recipes/blob/master/examples/lstm_text_generation.py
Here is the architecture:
l_in = lasagne.layers.InputLayer(shape=(None, None, vocab_size))
# We now build the LSTM layer which takes l_in as the input layer
# We clip the gradients at GRAD_CLIP to prevent the problem of exploding gradients.
l_forward_1 = lasagne.layers.LSTMLayer(
l_in, N_HIDDEN, grad_clipping=GRAD_CLIP,
nonlinearity=lasagne.nonlinearities.tanh)
l_forward_2 = lasagne.layers.LSTMLayer(
l_forward_1, N_HIDDEN, grad_clipping=GRAD_CLIP,
nonlinearity=lasagne.nonlinearities.tanh)
# The l_forward layer creates an output of dimension (batch_size, SEQ_LENGTH, N_HIDDEN)
# Since we are only interested in the final prediction, we isolate that quantity and feed it to the next layer.
# The output of the sliced layer will then be of size (batch_size, N_HIDDEN)
l_forward_slice = lasagne.layers.SliceLayer(l_forward_2, -1, 1)
# The sliced output is then passed through the softmax nonlinearity to create probability distribution of the prediction
# The output of this stage is (batch_size, vocab_size)
l_out = lasagne.layers.DenseLayer(l_forward_slice, num_units=vocab_size, W = lasagne.init.Normal(), nonlinearity=lasagne.nonlinearities.softmax)
# Theano tensor for the targets
target_values = T.ivector('target_output')
# lasagne.layers.get_output produces a variable for the output of the net
network_output = lasagne.layers.get_output(l_out)
# The loss function is calculated as the mean of the (categorical) cross-entropy between the prediction and target.
cost = T.nnet.categorical_crossentropy(network_output,target_values).mean()
# Retrieve all parameters from the network
all_params = lasagne.layers.get_all_params(l_out)
# Compute AdaGrad updates for training
print("Computing updates ...")
updates = lasagne.updates.adagrad(cost, all_params, LEARNING_RATE)
# Theano functions for training and computing cost
print("Compiling functions ...")
train = theano.function([l_in.input_var, target_values], cost, updates=updates, allow_input_downcast=True)
compute_cost = theano.function([l_in.input_var, target_values], cost, allow_input_downcast=True)
# In order to generate text from the network, we need the probability distribution of the next character given
# the state of the network and the input (a seed).
# In order to produce the probability distribution of the prediction, we compile a function called probs.
probs = theano.function([l_in.input_var],network_output,allow_input_downcast=True)
and the model is trained via:
for it in xrange(data_size * num_epochs / BATCH_SIZE):
try_it_out() # Generate text using the p^th character as the start.
avg_cost = 0;
for _ in range(PRINT_FREQ):
x,y = gen_data(p)
#print(p)
p += SEQ_LENGTH + BATCH_SIZE - 1
if(p+BATCH_SIZE+SEQ_LENGTH >= data_size):
print('Carriage Return')
p = 0;
avg_cost += train(x, y)
print("Epoch {} average loss = {}".format(it*1.0*PRINT_FREQ/data_size*BATCH_SIZE, avg_cost / PRINT_FREQ))
How can I save the model so I do not need to train it again? With scikit I generally just pickle the model object. However I am unclear on the analogous process with Theano / lasagne.