Long Sequence In a seq2seq model with attention?
Asked Answered
S

1

7

I am following along this pytorch tutorial and trying to apply this principle to summarization, where the encoding sequence would be around 1000 words and decoder target 200 words.

How do I apply seq2seq to this? I know it would be very expensive and almost infeasible to run through the whole sequence of 1000 words at once. So dividing the seq into say 20 seq and running in parallel could be an answer. But I'm not sure how to implement it; I also want to incorporate attention into it.

Stemma answered 4/6, 2017 at 5:45 Comment(1)
Recurrent networks are inherently sequential. They cannot be parallelized because each computation depends on the previous one. Moreover, most state of the art summarization architectures use the standard seq2seq paradigm without any issues. I wonder why you call it infeasible.Enugu
H
1

You can not parallelize RNN in time (1000 here) because they are inherently sequential.

You can use a light RNN, something like QRNN or SRU as a faster alternative(which is still sequential).

Another common sequence processing modules are TCN and Transformers which are both parallelizable in time.

Also, note that all of them can be used with attention and work perfectly fine with text.

Horsefaced answered 12/3, 2019 at 6:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.