I am following along this pytorch tutorial and trying to apply this principle to summarization, where the encoding sequence would be around 1000 words and decoder target 200 words.
How do I apply seq2seq
to this? I know it would be very expensive and almost infeasible to run through the whole sequence of 1000 words at once. So dividing the seq into say 20 seq and running in parallel could be an answer. But I'm not sure how to implement it; I also want to incorporate attention into it.