Is it possible to use n-grams in Keras?
E.g., sentences contain in X_train dataframe with "sentences" column.
I use tokenizer from Keras in the following manner:
tokenizer = Tokenizer(lower=True, split=' ')
tokenizer.fit_on_texts(X_train.sentences)
X_train_tokenized = tokenizer.texts_to_sequences(X_train.sentences)
And later I pad the sentences thus:
X_train_sequence = sequence.pad_sequences(X_train_tokenized)
Also I use a simple LSTM network:
model = Sequential()
model.add(Embedding(MAX_FEATURES, 128))
model.add(LSTM(32, dropout=0.2, recurrent_dropout=0.2,
activation='tanh', return_sequences=True))
model.add(LSTM(64, dropout=0.2, recurrent_dropout=0.2, activation='tanh'))
model.add(Dense(number_classes, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
metrics=['accuracy'])
In this case, tokenizer execution. In Keras docs: https://keras.io/preprocessing/text/ I see character processing is possible, but that is not appropriate for my case.
My main question: Can I use n-grams for NLP tasks (not only Sentiment Analysis but rather any NLP task)
For clarification: I'd like to consider not just words but combination of words. I'd like to try and see if it helps to model my task.