Tensorflow 2.0 save preprocessing tonkezier for nlp into tensorflow server

I have trained a tensforflow 2.0 keras model to make some natural language processing.

What I am doing basically is get the title of different news and predicting in what category they belong. In order to do that I have to tokenize the sentences and then add 0 to fill the array to have the same lenght that I defined:

 from tensorflow.keras.preprocessing.text import Tokenizer
 from tensorflow.keras.preprocessing.sequence import pad_sequences

 max_words = 1500
 tokenizer = Tokenizer(num_words=max_words )
 tokenizer.fit_on_texts(x.values)
 X = tokenizer.texts_to_sequences(x.values)
 X = pad_sequences(X, maxlen = 32)

  from tensorflow.keras import Sequential
  from tensorflow.keras.layers import Dense, Embedding, LSTM, GRU,InputLayer

  numero_clases = 5

  modelo_sentimiento = Sequential()
  modelo_sentimiento.add(InputLayer(input_tensor=tokenizer.texts_to_sequences, input_shape=(None, 32)))
  modelo_sentimiento.add(Embedding(max_palabras, 128, input_length=X.shape[1]))
  modelo_sentimiento.add(LSTM(256, dropout=0.2, recurrent_dropout=0.2, return_sequences=True))
  modelo_sentimiento.add(LSTM(256, dropout=0.2, recurrent_dropout=0.2))

  modelo_sentimiento.add(Dense(numero_clases, activation='softmax'))
  modelo_sentimiento.compile(loss = 'categorical_crossentropy', optimizer='adam',
                            metrics=['acc',f1_m,precision_m, recall_m])
  print(modelo_sentimiento.summary())

Now once trained I want to deploy it for example in tensorflow serving, but I don't know how to save this preprocessing(tokenizer) into the server, like make a scikit-learn pipeline, it is possible to do it here? or I have to save the tokenizer and make the preprocessing by my self and then call the model trained to predict?

Unfortunately, you won't be able to do something as elegant as a sklearn Pipeline with Keras models (at least I'm not aware of) easily. Of course you'd be able to create your own Transformer which will achieve the preprocessing you need. But given my experience trying to incorporate custom objects in sklearn pipelines, I don't think it's worth the effort.

What you can do is save the tokenizer along with metadata using,

with open('tokenizer_data.pkl', 'wb') as handle:
    pickle.dump(
        {'tokenizer': tokenizer, 'num_words':num_words, 'maxlen':pad_len}, handle)

And then load it when you want to use it,

with open("tokenizer_data.pkl", 'rb') as f:
    data = pickle.load(f)
    tokenizer = data['tokenizer']
    num_words = data['num_words']
    maxlen = data['maxlen']

Recommended topics

Hot tags