attention-model Questions

3

Solved

This is the screen of the original paper: the screen of the paper. I understand the meaning of the paper is that when the value of dot-product is large, the gradient of softmax will get very small....
Ningpo asked 27/2, 2019 at 12:42

3

I am learning to apply Transform model proposed by Attention Is All You Need from tensorflow official document Transformer model for language understanding. As section Positional encoding says: ...
Wind asked 8/7, 2019 at 8:12

4

I'm currently studying code of transformer, but I can not understand the masked multi-head of decoder. The paper said that it is to prevent you from seeing the generating word, but I can not unsers...

1

I already did the implementation of the CNN part and everything seems to be working just fine. Afterwards started to implement the LSTM part and, If I understood it right, the output shape should b...

1

I created a Seq2Seq model for text summarization. I have two models, one with attention and one without. The one without attention was able to generate predictions but I can't do it for the one wit...
Transfix asked 19/7, 2021 at 17:35

2

https://github.com/taoshen58/BiBloSA/blob/ec67cbdc411278dd29e8888e9fd6451695efc26c/context_fusion/self_attn.py#L29 I need to use mulit_dimensional_attention from the above link which is implemente...
Outshout asked 17/3, 2020 at 12:1

2

Solved

I am learning the Transformer. Here is the pytorch document for MultiheadAttention. In their implementation, I saw there is a constraint: assert self.head_dim * num_heads == self.embed_dim, "...
Yardage asked 26/2, 2021 at 16:45

2

I have been following this post in order to implement attention layer over my LSTM model. Code for the attention layer: INPUT_DIM = 2 TIME_STEPS = 20 SINGLE_ATTENTION_VECTOR = False APPLY_ATTENTION...
Maltose asked 15/8, 2017 at 11:9

2

I am working on a machine language translation problem. The Model I am using is: Model = Sequential([ Embedding(english_vocab_size, 256, input_length=english_max_len, mask_zero=True), LSTM(256, ...
Acceptant asked 11/10, 2020 at 7:30

1

I am struggling to mask my input for the MultiHeadAttention Layer. I am using the Transformer Block from Keras documentation with self-attention. I could not find any example code online so far and...

2

Solved

What the difference between att_mask and key_padding_mask in MultiHeadAttnetion of pytorch: key_padding_mask – if provided, specified padding elements in the key will be ignored by the attention. ...

2

Solved

Using this implementation I have included attention to my RNN (which classify the input sequences into two classes) as follows. visible = Input(shape=(250,)) embed=Embedding(vocab_size,100)(visib...

0

Where the values are rnn_size: 512 batch_size: 128 rnn_inputs: Tensor("embedding_lookup/Identity_1:0", shape=(?, ?, 128), dtype=float32) sequence_length: Tensor("inputs_length:0&qu...
Cataplasm asked 27/6, 2021 at 16:1

1

I am working on an attention model, and before running the final model, I was going through the tensor shapes which flow through the code. I have an operation where I need to reshape the tensor. Th...
Iorgos asked 24/4, 2019 at 17:28

1

Solved

Trying to use the AdditiveAttention layer in Keras. On manual implementation of the layer from tensorflow tutorial https://www.tensorflow.org/tutorials/text/nmt_with_attention import tensorflow as ...

2

Solved

I am trying to understand why transformers use multiple attention heads. I found the following quote: Instead of using a single attention function where the attention can be dominated by the actua...
Bethel asked 17/2, 2021 at 14:38

5

Solved

I am following this tutorial: http://nlp.seas.harvard.edu/2018/04/03/attention.html to implement the Transformer model from the "Attention Is All You Need" paper. However I am getting the followi...
Jeminah asked 22/10, 2018 at 4:32

2

I am struggling with the concept of attention in the the context of autoencoders. I believe I understand the usage of attention with regards to seq2seq translation - after training the combined enc...

1

Solved

I have n-vectors which need to be influenced by each other and output n vectors with same dimensionality d. I believe this is what torch.nn.MultiheadAttention does. But the forward function expects...
Regalado asked 9/1, 2021 at 12:51

2

Solved

I was following a paper on BERT-based lexical substitution (specifically trying to implement equation (2) - if someone has already implemented the whole paper that would also be great). Thus, I wan...

0

In this tutorial in tensorflow site we can see a code for the implementation of an autoencoder which it's Decoder is as follows: class Decoder(tf.keras.Model): def __init__(self, vocab_size, embed...
Hellhole asked 5/12, 2020 at 11:55

1

Solved

I have a simple network of one LSTM and two Dense layers as such: model = tf.keras.Sequential() model.add(layers.LSTM(20, input_shape=(train_X.shape[1], train_X.shape[2]))) model.add(layers.Dense(...
Chondrite asked 21/11, 2019 at 3:32

3

Solved

I am trying to understand attention model and also build one myself. After many searches I came across this website which had an atteniton model coded in keras and also looks simple. But when I tri...
Gogetter asked 9/7, 2019 at 7:3

2

With the following code: model = Sequential() num_features = data.shape[2] num_samples = data.shape[1] model.add( LSTM(16, batch_input_shape=(None, num_samples, num_features), return_sequ...
Lisette asked 5/11, 2018 at 9:3

5

Solved

These two attentions are used in seq2seq modules. The two different attentions are introduced as multiplicative and additive attentions in this TensorFlow documentation. What is the difference?
Hardtop asked 29/5, 2017 at 8:43

© 2022 - 2024 — McMap. All rights reserved.