attention-model

3

Solved

why softmax get small gradient when the value is large in paper 'Attention is all you need'

This is the screen of the original paper: the screen of the paper. I understand the meaning of the paper is that when the value of dot-product is large, the gradient of softmax will get very small....

deep-learning nlp softmax attention-model

Ningpo asked 27/2, 2019 at 12:42

3

Why does embedding vector multiplied by a constant in Transformer model?

I am learning to apply Transform model proposed by Attention Is All You Need from tensorflow official document Transformer model for language understanding. As section Positional encoding says: ...

python tensorflow deep-learning attention-model

Wind asked 8/7, 2019 at 8:12

4

How to understand masked multi-head attention in transformer

I'm currently studying code of transformer, but I can not understand the masked multi-head of decoder. The paper said that it is to prevent you from seeing the generating word, but I can not unsers...

tensorflow deep-learning transformer-model attention-model

Bonbon asked 27/9, 2019 at 2:40

1

How do I implement this attention layer in PyTorch?

I already did the implementation of the CNN part and everything seems to be working just fine. Afterwards started to implement the LSTM part and, If I understood it right, the output shape should b...

python machine-learning deep-learning pytorch attention-model

Fixate asked 9/7, 2023 at 17:7

1

Keras, model trains successfully but generating predictions gives ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor

I created a Seq2Seq model for text summarization. I have two models, one with attention and one without. The one without attention was able to generate predictions but I can't do it for the one wit...

python tensorflow keras attention-model seq2seq

Transfix asked 19/7, 2021 at 17:35

2

Is there any way to convert pytorch tensor to tensorflow tensor

https://github.com/taoshen58/BiBloSA/blob/ec67cbdc411278dd29e8888e9fd6451695efc26c/context_fusion/self_attn.py#L29 I need to use mulit_dimensional_attention from the above link which is implemente...

pytorch attention-model mindsdb

Outshout asked 17/3, 2020 at 12:1

2

Solved

Why embed dimemsion must be divisible by num of heads in MultiheadAttention?

I am learning the Transformer. Here is the pytorch document for MultiheadAttention. In their implementation, I saw there is a constraint: assert self.head_dim * num_heads == self.embed_dim, "...

python-3.x pytorch transformer-model attention-model

Yardage asked 26/2, 2021 at 16:45

2

Attention Layer throwing TypeError: Permute layer does not support masking in Keras

I have been following this post in order to implement attention layer over my LSTM model. Code for the attention layer: INPUT_DIM = 2 TIME_STEPS = 20 SINGLE_ATTENTION_VECTOR = False APPLY_ATTENTION...

python tensorflow keras lstm attention-model

Maltose asked 15/8, 2017 at 11:9

2

How can I add tf.keras.layers.AdditiveAttention in my model?

I am working on a machine language translation problem. The Model I am using is: Model = Sequential([ Embedding(english_vocab_size, 256, input_length=english_max_len, mask_zero=True), LSTM(256, ...

python machine-learning keras deep-learning attention-model

Acceptant asked 11/10, 2020 at 7:30

1

MultiHeadAttention attention_mask [Keras, Tensorflow] example

I am struggling to mask my input for the MultiHeadAttention Layer. I am using the Transformer Block from Keras documentation with self-attention. I could not find any example code online so far and...

tensorflow machine-learning keras transformer-model attention-model

Mcardle asked 2/6, 2021 at 12:29

2

Solved

what the difference between att_mask and key_padding_mask in MultiHeadAttnetion

What the difference between att_mask and key_padding_mask in MultiHeadAttnetion of pytorch: key_padding_mask – if provided, specified padding elements in the key will be ignored by the attention. ...

python deep-learning pytorch transformer-model attention-model

Fumble asked 29/6, 2020 at 0:31

2

Solved

How to visualize attention weights?

Using this implementation I have included attention to my RNN (which classify the input sequences into two classes) as follows. visible = Input(shape=(250,)) embed=Embedding(vocab_size,100)(visib...

keras deep-learning nlp recurrent-neural-network attention-model

Romanic asked 20/12, 2018 at 11:0

0

Getting error while converting a code in tf1 to tf2

Where the values are rnn_size: 512 batch_size: 128 rnn_inputs: Tensor("embedding_lookup/Identity_1:0", shape=(?, ?, 128), dtype=float32) sequence_length: Tensor("inputs_length:0&qu...

python tensorflow keras tensorflow2.0 attention-model

Cataplasm asked 27/6, 2021 at 16:1

1

Different `grad_fn` for similar looking operations in Pytorch (1.0)

I am working on an attention model, and before running the final model, I was going through the tensor shapes which flow through the code. I have an operation where I need to reshape the tensor. Th...

python pytorch attention-model

Iorgos asked 24/4, 2019 at 17:28

1

Solved

Output shapes of Keras AdditiveAttention Layer

Trying to use the AdditiveAttention layer in Keras. On manual implementation of the layer from tensorflow tutorial https://www.tensorflow.org/tutorials/text/nmt_with_attention import tensorflow as ...

tensorflow keras deep-learning neural-network attention-model

Shyamal asked 2/5, 2021 at 6:42

2

Solved

Why use multi-headed attention in Transformers?

I am trying to understand why transformers use multiple attention heads. I found the following quote: Instead of using a single attention function where the attention can be dominated by the actua...

nlp transformer-model attention-model

Bethel asked 17/2, 2021 at 14:38

5

Solved

RuntimeError: "exp" not implemented for 'torch.LongTensor'

I am following this tutorial: http://nlp.seas.harvard.edu/2018/04/03/attention.html to implement the Transformer model from the "Attention Is All You Need" paper. However I am getting the followi...

pytorch tensor attention-model

Jeminah asked 22/10, 2018 at 4:32

2

Does attention make sense for Autoencoders?

I am struggling with the concept of attention in the the context of autoencoders. I believe I understand the usage of attention with regards to seq2seq translation - after training the combined enc...

lstm recurrent-neural-network autoencoder dimensionality-reduction attention-model

Skaggs asked 28/9, 2019 at 10:49

1

Solved

Inputs to the nn.MultiheadAttention?

I have n-vectors which need to be influenced by each other and output n vectors with same dimensionality d. I believe this is what torch.nn.MultiheadAttention does. But the forward function expects...

python deep-learning pytorch attention-model

Regalado asked 9/1, 2021 at 12:51

2

Solved

Outputting attention for bert-base-uncased with huggingface/transformers (torch)

I was following a paper on BERT-based lexical substitution (specifically trying to implement equation (2) - if someone has already implemented the whole paper that would also be great). Thus, I wan...

python attention-model huggingface-transformers bert-language-model

Carmencarmena asked 7/2, 2020 at 20:46

0

Where should we put attention in an autoencoder?

In this tutorial in tensorflow site we can see a code for the implementation of an autoencoder which it's Decoder is as follows: class Decoder(tf.keras.Model): def __init__(self, vocab_size, embed...

python tensorflow pytorch attention-model

Hellhole asked 5/12, 2020 at 11:55

1

Solved

Adding Attention on top of simple LSTM layer in Tensorflow 2.0

I have a simple network of one LSTM and two Dense layers as such: model = tf.keras.Sequential() model.add(layers.LSTM(20, input_shape=(train_X.shape[1], train_X.shape[2]))) model.add(layers.Dense(...

python tensorflow keras lstm attention-model

Chondrite asked 21/11, 2019 at 3:32

3

Solved

How to build a attention model with keras?

I am trying to understand attention model and also build one myself. After many searches I came across this website which had an atteniton model coded in keras and also looks simple. But when I tri...

python tensorflow keras deep-learning attention-model

Gogetter asked 9/7, 2019 at 7:3

2

Keras - Add attention mechanism to an LSTM model [duplicate]

With the following code: model = Sequential() num_features = data.shape[2] num_samples = data.shape[1] model.add( LSTM(16, batch_input_shape=(None, num_samples, num_features), return_sequ...

python machine-learning keras lstm attention-model

Lisette asked 5/11, 2018 at 9:3

5

Solved

What is the difference between Luong attention and Bahdanau attention?

These two attentions are used in seq2seq modules. The two different attentions are introduced as multiplicative and additive attentions in this TensorFlow documentation. What is the difference?

tensorflow deep-learning nlp attention-model

Hardtop asked 29/5, 2017 at 8:43

attention-model Questions

Recommended topics

Hot tags