Adapters#
AlphaGenome supports several transfer learning strategies via
TransferConfig.
See Yuan et al., 2025
for more details about using these adapters for sequence-to-function models and
calico/baskerville
for how such adapters can be used on other models like Borzoi.
Available Modes#
Mode |
Trainable Params |
When to Use |
|---|---|---|
|
Heads only |
Fast baseline |
|
Heads + LoRA adapters |
Extra expressiveness in addition to the linear baseline |
|
Heads + Locon adapters |
Alternative to LoRA, applied to conv layers |
|
Heads + IA3 scaling |
Minimal added parameters |
|
Heads + Houlsby bottleneck adapters |
Classic bottleneck adapters with residual connection |
|
All weights |
Maximum expressiveness |
Linear Probing#
The simplest approach: freeze the entire pretrained trunk and train only the newly added heads. This is the fastest mode and a strong baseline.
config = TransferConfig(
mode='linear',
remove_heads=['atac', 'dnase'],
new_heads={'my_atac': {'modality': 'atac', 'num_tracks': 4}},
)
model = prepare_for_transfer(model, config)
No adapter parameters are injected — only head weights are trainable.
LoRA#
Low-Rank Adaptation adds small trainable low-rank matrices to Linear layers (typically attention projections) while keeping the trunk frozen. This is the recommended mode for most use cases.
Reference: LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2021)
config = TransferConfig(
mode='lora',
lora_rank=8, # Rank of the low-rank matrices
lora_alpha=16, # Scaling factor (alpha / rank)
lora_targets=['q_proj', 'v_proj'], # Target modules by name
remove_heads=['atac', 'dnase'],
new_heads={'my_atac': {'modality': 'atac', 'num_tracks': 4}},
)
model = prepare_for_transfer(model, config)
Parameters:
lora_rank— rank of the decomposition (higher = more expressive, more params)lora_alpha— scaling factor; effective scale isalpha / ranklora_targets— list of substrings to match in module names (e.g.['q_proj', 'v_proj'])
After training, LoRA weights can be merged into the base layers for zero-overhead inference — see Merging Adapters for Inference below.
Locon#
LoRA for Convolutional layers applies the same low-rank adaptation to Conv1D layers. Useful for adapting the convolutional tower.
config = TransferConfig(
mode=['lora', 'locon'],
lora_targets=['q_proj', 'v_proj'],
locon_rank=4, # Rank for conv decomposition
locon_alpha=1, # Scaling factor
locon_targets=['down_blocks.4', 'down_blocks.5'], # 4 Locon adapters on encoder
remove_heads=['atac'],
new_heads={'my_atac': {'modality': 'atac', 'num_tracks': 4}},
)
model = prepare_for_transfer(model, config)
Parameters:
locon_rank— rank of the decomposition (default: 4)locon_alpha— scaling factor (default: 1)locon_targets— list of substrings to match Conv1D module names. Required when Locon is enabled.
Use block-level targets:
Locon2:['down_blocks.5'](2 Locon adapters)Locon4:['down_blocks.4', 'down_blocks.5'](4 Locon adapters)Locon6:['down_blocks.3', 'down_blocks.4', 'down_blocks.5'](6 Locon adapters)
IA3#
Infused Adapter by Inhibiting and Amplifying Inner Activations learns
a multiplicative scaling vector for layer outputs. Extremely
parameter-efficient — only output_dim parameters per adapted layer.
config = TransferConfig(
mode='ia3',
ia3_targets=['k_proj', 'v_proj'], # Output-scaling targets
ia3_ff_targets=['fc2'], # Input-scaling targets (feed-forward)
remove_heads=['atac'],
new_heads={'my_atac': {'modality': 'atac', 'num_tracks': 4}},
)
model = prepare_for_transfer(model, config)
Parameters:
ia3_targets— modules for output scaling (IA3)ia3_ff_targets— modules for input scaling (IA3_FF, used in feed-forward layers)
Houlsby Adapters#
Classic bottleneck adapters insert a down-projection → activation → up-projection block with a residual connection. This implementation follows the Baskerville TensorFlow reference, placing adapters at transformer block boundaries.
Reference: Parameter-Efficient Transfer Learning for NLP (Houlsby et al., 2019)
Block-Level Placement
The default placement inserts adapters after each transformer sub-layer (MHA and MLP), before the residual add:
config = TransferConfig(
mode='houlsby',
houlsby_latent_dim=8, # Bottleneck dimension
houlsby_placement='block', # Baskerville-style (default)
houlsby_targets=['mha', 'mlp'], # Adapt both MHA and MLP blocks
unfreeze_norm=True, # Unfreeze LayerNorm/RMSBatchNorm (default)
remove_heads=['atac'],
new_heads={'my_atac': {'modality': 'atac', 'num_tracks': 4}},
)
model = prepare_for_transfer(model, config)
The computation for each transformer block becomes:
x = x + adapter(mha(x)) # adapter has internal residual
= x + mha(x) + bottleneck(mha(x))
x = x + adapter(mlp(x))
= x + mlp(x) + bottleneck(mlp(x))
Linear-Level Placement
You can also wrap individual Linear layers (similar to LoRA targeting):
config = TransferConfig(
mode='houlsby',
houlsby_latent_dim=8,
houlsby_placement='linear',
houlsby_targets=['q_proj', 'v_proj'], # Target specific projections
remove_heads=['atac'],
new_heads={'my_atac': {'modality': 'atac', 'num_tracks': 4}},
)
model = prepare_for_transfer(model, config)
Parameters:
houlsby_latent_dim— bottleneck dimension (default: 8)houlsby_placement— where to insert adapters:'block'(default): Baskerville-style, at transformer block boundaries'linear': wrap individual Linear layers
houlsby_targets— which components to adapt:For
'block':['mha', 'mlp'](default),['mha'], or['mlp']For
'linear': module name substrings like['q_proj', 'v_proj']
unfreeze_norm— whether to unfreeze normalization layers (default:True). This matches Baskerville’s behavior where LayerNorm parameters are trained alongside adapters.
Combining Adapter Modes#
Adapter modes (lora, locon, ia3, houlsby) can be combined by passing a list
to mode. This applies each adapter type simultaneously — for example,
LoRA on attention layers and Locon on convolutional layers:
config = TransferConfig(
mode=['lora', 'locon'],
# LoRA settings (applied to attention)
lora_rank=8,
lora_alpha=16,
lora_targets=['q_proj', 'v_proj'],
# Locon settings (applied to convolutions)
locon_rank=4,
locon_alpha=1,
locon_targets=['down_blocks.5'],
remove_heads=['atac', 'dnase'],
new_heads={'my_atac': {'modality': 'atac', 'num_tracks': 4}},
)
model = prepare_for_transfer(model, config)
Rules:
'full'cannot be combined with other modes.'linear'can appear alongside adapter modes — the trunk is frozen and adapter layers are injected on top.Any subset of
['lora', 'locon', 'ia3', 'houlsby']can be combined.
Merging Adapters for Inference#
Some adapters can be folded back into the base layer weights,
eliminating all adapter overhead at inference time. After merging, the adapted
layers become plain nn.Linear modules and the model’s state dict is
compatible with vanilla AlphaGenome.
from alphagenome_pytorch.extensions.finetuning import merge_adapters
model = merge_adapters(model)
Adapter |
Mergeable? |
Reason |
|---|---|---|
LoRA |
Yes |
Linear decomposition |
IA3 / IA3_FF |
Yes |
Multiplicative scaling folds into weight rows (IA3) or columns (IA3_FF). |
Locon |
No |
AlphaGenome’s convolutional layers use |
Houlsby |
No |
The bottleneck contains a nonlinear activation (ReLU) between the down- and up-projections, so it cannot be represented as a single linear transform. |