Config change for a pre-trained transformer model
Asked Answered
M

1

3

I am trying to implement a classification head for the reformer transformer. The classification head works fine, but when I try to change one of the config parameters- config.axial_pos_shape i.e sequence length parameter for the model it throws an error;

size mismatch for reformer.embeddings.position_embeddings.weights.0: copying a param with shape torch.Size([512, 1, 64]) from checkpoint, the shape in current model is torch.Size([64, 1, 64]). size mismatch for reformer.embeddings.position_embeddings.weights.1: copying a param with shape torch.Size([1, 1024, 192]) from checkpoint, the shape in current model is torch.Size([1, 128, 192]).

The config:

{
  "architectures": [
    "ReformerForSequenceClassification"
  ],
  "attention_head_size": 64,
  "attention_probs_dropout_prob": 0.1,
  "attn_layers": [
    "local",
    "lsh",
    "local",
    "lsh",
    "local",
    "lsh"
  ],
  "axial_norm_std": 1.0,
  "axial_pos_embds": true,
  "axial_pos_embds_dim": [
    64,
    192
  ],
  "axial_pos_shape": [
    64,
    256
  ],
  "chunk_size_feed_forward": 0,
  "chunk_size_lm_head": 0,
  "eos_token_id": 2,
  "feed_forward_size": 512,
  "hash_seed": null,
  "hidden_act": "relu",
  "hidden_dropout_prob": 0.05,
  "hidden_size": 256,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "is_decoder": true,
  "layer_norm_eps": 1e-12,
  "local_attention_probs_dropout_prob": 0.05,
  "local_attn_chunk_length": 64,
  "local_num_chunks_after": 0,
  "local_num_chunks_before": 1,
  "lsh_attention_probs_dropout_prob": 0.0,
  "lsh_attn_chunk_length": 64,
  "lsh_num_chunks_after": 0,
  "lsh_num_chunks_before": 1,
  "max_position_embeddings": 8192,
  "model_type": "reformer",
  "num_attention_heads": 2,
  "num_buckets": [
    64,
    128
  ],
  "num_chunks_after": 0,
  "num_chunks_before": 1,
  "num_hashes": 1,
  "num_hidden_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 100
    }
  },
  "vocab_size": 320
}

Python Code:

config = ReformerConfig()
config.max_position_embeddings = 8192
config.axial_pos_shape=[64, 128]

#config = ReformerConfig.from_pretrained('./cnp/config.json', output_attention=True)

model = ReformerForSequenceClassification(config)
model.load_state_dict(torch.load("./cnp/pytorch_model.bin"))
Mohn answered 26/6, 2020 at 21:31 Comment(2)
You try to load a model with a different layer size as the model you want to initialize. That will not work and this is what the error message is telling you. I haven't worked with Reformers but you can maybe load it and resize it later. But I'm not sure if this will ruin the pretraining.Winglet
@Winglet I agree with your comment and this is what was happening. Unfortunately, you have provided comment, otherwise, I would have accepted this as an answer.Mohn
I
2

I run into the same issue, trying to halve the size of the 65536 (128*512) by default max sequence length used in Reformer pre-training.

As @cronoik mentioned, you must:

  1. load pretrained Reformer
  2. resize it to your need by dropping unnecessary weights
  3. save this new model
  4. load this new model to perform your desired tasks

Those unnecessary weights are the ones from the Position Embeddings layer. In Reformer model, the Axial Position Encodings strategy was used to learn the position embeddings (rather than having fixed ones like BERT). Axial Position Encodings stores position embeddings in a memory efficient manner, using two small tensors rather than a big one.

However, the idea of position embeddings remains exactly the same, which is obtaining different embeddings for each position.

That said, in theory (correct me if I am misunderstanding somewhere), removing the last position embeddings to match your custom max sequence length should not hurt the performance. You can refer to this post from HuggingFace to see a more detailed description of Axial Position Encodings and understand where to truncate your position embeddings tensor.

I have managed to resize and use Reformer with a custom max length of 32768 (128*256) with the following code:

# Load intial pretrained model
model = ReformerForSequenceClassification.from_pretrained('google/reformer-enwik8', num_labels=2)

# Reshape Axial Position Embeddings layer to match desired max seq length       
model.reformer.embeddings.position_embeddings.weights[1] = torch.nn.Parameter(model.reformer.embeddings.position_embeddings.weights[1][0][:256])

# Update the config file to match custom max seq length
model.config.axial_pos_shape = 128, 256
model.config.max_position_embeddings = 128*256 # 32768

# Save model with custom max length
output_model_path = "path/to/model"
model.save_pretrained(output_model_path)
Interlanguage answered 16/12, 2020 at 16:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.