How to apply SMOTE technique (oversampling) before word embedding layer
Asked Answered
W

1

6

How to apply SMOTE algorithm before word embedding layer in LSTM.

I have a problem of text binary classification (Good(9500) or Bad(500) review with total of 10000 training sample and it's unbalanced training sample), mean while i am using LSTM with pre-trained word-embeddings (100 dimension space for each word) as well, so each training input have an id's (Total of 50 ids with zero padding's as well when the text description is having lesser than 50 words and trimmed to 50 when the description is exceeded 50 characters) of word dictionary.

Below is my general flow,

  • Input - 1000(batch) X 50 (sequence length)
  • Word Embedding - 200(Unique vocabulary word) X 100 (word representation)
  • After word embedding layer (new input for LSTM) - 1000(batch) X 50(sequence) X 100 (features)
  • Final State from LSTM 1000 (batch) X 100 (units)
  • Apply final layer 1000(batch) X 100 X [100(units) X 2 (output class)]

All i want to generate more data for Bad review with the help of SMOTE

Wulfila answered 19/11, 2018 at 23:41 Comment(0)
A
1

I faced the same issue. Found this post on stackexchange which proposes to adjust the weights of the class distribution instead of oversampling. Apparently it is the standard way in LSTM / RNN to deal with class imbalance.

https://stats.stackexchange.com/questions/342170/how-to-train-an-lstm-when-the-sequence-has-imbalanced-classes

Ambient answered 30/7, 2021 at 8:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.