Target modules for applying PEFT / LoRA on different models

Asked 26/7, 2023 at 5:23 Answered 31/3 at 13:21

Solved nlp huggingface-transformers huggingface fine-tuning peft

I am looking at a few different examples of using PEFT on different models. The LoraConfig object contains a target_modules array. In some examples, the target modules are ["query_key_value"], sometimes it is ["q", "v"], sometimes something else.

I don't quite understand where the values of the target modules come from. Where in the model page should I look to know what the LoRA adaptable modules are?

One example (for the model Falcon 7B):

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h",
    ]

Another example (for the model Opt-6.7B):

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

Yet another (for the model Flan-T5-xxl):

lora_config = LoraConfig(
 r=16,
 lora_alpha=32,
 target_modules=["q", "v"],
 lora_dropout=0.05,
 bias="none",
 task_type=TaskType.SEQ_2_SEQ_LM
)

Flit answered 26/7, 2023 at 5:23 Comment(0)

Let's say that you load some model of your choice:

model = AutoModelForCausalLM.from_pretrained("some-model-checkpoint")

Then you can see available modules by printing out this model:

print(model)

You will get something like this (SalesForce/CodeGen25):

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(51200, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=51200, bias=False)
)

In my case, you can find the LLamaAttention module that contains q_proj, k_proj, v_proj, and o_proj. And this are some modules available for LoRA.

I suggest you reading more about which modules to use in LoRA paper.

Guzzle answered 27/7, 2023 at 12:54 Comment(0)

I will add another answer as none of present ones feel complete/general for me.

To solve the original question, getting a list of Lora compatible modules programmatically, I have tried using

target_modules = 'all-linear',

which seems available in latest PEFT versions. However, that would raise an error when applying to google/gemma-2b model. (dropout layers were for some reason added to the target_modules, see later for the layers supported by LORA).

From documentation of the PEFT library:

only the following modules: `torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv2d`, `transformers.pytorch_utils.Conv1D`.

I ended up creating this function for getting all Lora compatible modules from arbitrary models:

import torch
from transformers import Conv1D

def get_specific_layer_names(model):
    # Create a list to store the layer names
    layer_names = []
    
    # Recursively visit all modules and submodules
    for name, module in model.named_modules():
        # Check if the module is an instance of the specified layers
        if isinstance(module, (torch.nn.Linear, torch.nn.Embedding, torch.nn.Conv2d, Conv1D)):
            # model name parsing 

            layer_names.append('.'.join(name.split('.')[4:]).split('.')[0])
    
    return layer_names

list(set(get_specific_layer_names(model)))

Which yields on gemma-2B

[
 'down_proj',
 'o_proj',
 'k_proj',
 'q_proj',
 'gate_proj',
 'up_proj',
 'v_proj']

This list was valid for a target_modules selection

peft.__version__
'0.10.1.dev0'

transformers.__version__
'4.39.1'

Dehumanize answered 31/3 at 13:21 Comment(1)

Brilliant. Nice way to code this selection into your scripts wiithout having to look every model up. – Progress 29/6 at 20:35

Here method to get all linear.

import bitsandbytes as bnb

def find_all_linear_names(model):
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, bnb.nn.Linear4bit):
            names = name.split(".")
            # model-specific
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if "lm_head" in lora_module_names:  # needed for 16-bit
        lora_module_names.remove("lm_head")
    return list(lora_module_names)

In the futur release you can use directly target_modules="all-linear" in your LoraConfig

Cathryncathy answered 24/1 at 10:46 Comment(1)

This is not a valid answer and is also wrong. what is "bnb"? this is not for pytorch. you are not assigning the return of the method to a variable and the return for this method is also not the name of the linear layers. – Contrariety 3/3 at 20:39

Recommended topics

Hot tags