Command-Line Interface (agt)#
AlphaGenome PyTorch ships a CLI called agt — short for
AlphaGenome Torch (and three of the four nucleotides).
After installing the package the command is available globally:
pip install alphagenome-pytorch
pip install alphagenome-pytorch[inference] # + predict
pip install alphagenome-pytorch[finetuning] # + finetune
pip install alphagenome-pytorch[scoring] # + score
pip install alphagenome-pytorch[serving] # + serve
Global options#
agt [--json] <command> [options]
--jsonMachine-readable JSON output on stdout for commands that support structured output. Suppresses progress bars and human formatting where implemented.
Errors produce a JSON object on stderr with a nonzero exit code:
{"error": "FileNotFoundError", "message": "No such file: model.pth"}
agt info#
Inspect the model architecture, available heads, track metadata, and the contents of a weights file.
Static information (no weights file needed)#
# Overview — heads, track counts per organism, resolutions
agt info
# List all heads with track counts
agt info --heads
Example output:
Head Tracks (human) Tracks (mouse) Dimension Resolutions
atac 167 155 256 1bp, 128bp
dnase 305 280 384 1bp, 128bp
procap 12 8 128 1bp, 128bp
cage 546 490 640 1bp, 128bp
rna_seq 667 600 768 1bp, 128bp
chip_tf 1617 1500 1664 128bp
chip_histone 1116 1000 1152 128bp
contact_maps 28 28 28 64x64
splice_sites 5 5 5 1bp
splice_junctions 734 734 734 pairwise
splice_site_usage 734 734 734 1bp
Tracks = real (non-padding) tracks per organism. Dimension = tensor channel size (includes padding).
# List individual tracks for a head
agt info --tracks atac
agt info --tracks atac --organism mouse
# Search tracks by name or metadata
agt info --tracks atac --search K562
agt info --tracks atac --filter "biosample_name=liver"
Example output:
Head: atac | 167 tracks / 256 dimension (89 padding) | human
# Track Name Biosample Ontology
0 ENCSR637XSC ATAC-seq K562 EFO:0002067
1 ENCSR868FGM ATAC-seq HepG2 EFO:0001187
...
166 UBERON:0015143 ATAC-seq thymus UBERON:0015143
--- 89 padding tracks ---
Weights file inspection#
# Inspect a weights file — adds: file size, param count, dtype, format
agt info model.pth
# Inspect track_means for a specific head
agt info model.pth --track-means atac
agt info model.pth --track-means atac --organism human --top 10
# Validate a checkpoint — checks all keys present, shapes match
agt info model.pth --validate
# Compare two checkpoints
agt info model.pth --diff other.pth
# Inspect a delta/finetuned checkpoint
agt info delta.safetensors
JSON output#
agt --json info --heads
{
"heads": [
{
"name": "atac",
"dimension": 256,
"tracks": {"human": 167, "mouse": 155},
"padding": {"human": 89, "mouse": 101},
"resolutions": ["1bp", "128bp"]
}
]
}
agt --json info model.pth
{
"file": "model.pth",
"format": "pth",
"file_size_mb": 1247.3,
"total_parameters": 298542080,
"dtype": "float32",
"has_track_means": true,
"heads": ["atac", "dnase", "procap", "cage", "rna_seq", "chip_tf", "chip_histone"]
}
agt predict#
Run the model and write predictions to disk. Four input modes:
Input mode |
What it does |
Output |
|---|---|---|
|
Full-chromosome tiling |
BigWig per track |
|
One genomic interval |
BigWig per track |
|
Many genomic regions from a BED file |
BigWig per track (merged) |
|
Raw FASTA sequences (no genomic coordinates) |
NPZ per sequence |
--locus, --bed, and --sequences are mutually exclusive. When
--bed is given, --chromosomes can additionally be passed as a
chromosome filter over the BED rows (see below).
Requires: pip install alphagenome-pytorch[inference]
How size mismatches are handled#
The model expects a fixed input window of W bp, where W is set by
--window-size (default: 131 072). When an input region or sequence does
not match W, the CLI dispatches by mode and the --tile flag:
Mode |
Input < |
Input == |
Input > |
|---|---|---|---|
|
padded with real reference flanks (with warning) |
single window |
cut to center (with warning) |
|
padded with real reference flanks (with warning) |
single window |
stitched tiles |
|
error (cannot fake reference context) |
single window |
error (pass |
|
error |
single window |
stitched tiles |
|
n/a |
n/a |
stitched tiles (always) |
Input validation#
Chromosome coordinates must be non-negative and must fit inside the chromosome. The CLI rejects invalid input up front rather than clamping silently:
Error: Invalid locus 'chr1:-100-500': start (-100) must be ≥ 0
Error: chr1:248000000-250000000: end (250000000) exceeds chromosome length (248956422)
When a short region is padded, the fitted W-bp window is also required
to be in-bounds. If the region sits near a chromosome edge the window is
shifted inward (never clamped to negative coordinates), and a warning
describes the shift.
Per-region logging#
Each processed region prints a one-line status to stdout (suppressed under
--quiet / --json), plus any warning lines to stderr:
chr2:5000-7000 (2000bp) → padded
WARNING: chr2:5000-7000 (2000bp) padded with reference flanks; window shifted to [0, 131072) because region sits near chromosome start.
chr3:10000000-10002000 (2000bp) → padded
WARNING: chr3:10000000-10002000 (2000bp) padded with reference flanks to a 131072bp window [9935464, 10066536); output covers only the region.
chr4:1000000-2000000 (1000000bp) → tiled (12 tiles)
chr4:100-131172 (131072bp) → single
chr5:50000-1050000 (1000000bp) → cut
WARNING: chr5:50000-1050000 (1000000bp) center-cut to chr5:484464-615536 (131072bp); pass --tile to predict the full region.
Full chromosomes#
# Predict ATAC for chr1 and chr2
agt predict \
--model model.pth --fasta hg38.fa --output predictions/ \
--head atac --chromosomes chr1,chr2
# Whole genome, 1bp resolution (slower), torch.compile for speed
agt predict \
--model model.pth --fasta hg38.fa --output predictions/ \
--head atac --chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10 \
--resolution 1 --compile
# Reduce edge artifacts with overlapping tiles
agt predict \
--model model.pth --fasta hg38.fa --output predictions/ \
--head atac --chromosomes chr1 --crop-bp 32768
Locus (single interval)#
# Exactly one window — single forward pass
agt predict \
--model model.pth --fasta hg38.fa --output out/ \
--head atac --locus chr1:10000000-10131072
# Short locus — padded with real reference flanks
agt predict \
--model model.pth --fasta hg38.fa --output out/ \
--head atac --locus chr1:10000000-10005000
# Long locus (default) — center-cut to 131 072 bp, warning printed
agt predict \
--model model.pth --fasta hg38.fa --output out/ \
--head atac --locus chr1:10000000-11000000
# Long locus with --tile — full region predicted via stitched tiles
agt predict \
--model model.pth --fasta hg38.fa --output out/ \
--head atac --locus chr1:10000000-11000000 \
--tile --crop-bp 16384
Output file name:
{output}/{head}_{chrom}_{start}_{end}.bw (or per-track when multiple tracks).
BED file (many regions)#
BED columns: chrom, start, end, optional name. Lines
starting with #, track, or browser are skipped. Each region is
processed independently with the same size-handling rules as --locus;
all region predictions are merged into a single BigWig per track (gaps
between regions are left as no-data).
# Default — short regions padded, exact = single, long = cut (warns)
agt predict \
--model model.pth --fasta hg38.fa --output out/ \
--head atac --bed regions.bed
# --tile — long regions are stitched instead of cut
agt predict \
--model model.pth --fasta hg38.fa --output out/ \
--head atac --bed regions.bed --tile --crop-bp 16384
# Filter: predict only the rows on chr1 / chr2
agt predict \
--model model.pth --fasta hg38.fa --output out/ \
--head atac --bed regions.bed --chromosomes chr1,chr2
When --chromosomes is passed alongside --bed it acts as a whitelist
filter on the BED rows — useful for re-running a per-chromosome subset
without having to edit the BED. If the filter removes every row the CLI
errors out.
Output file name: {output}/{head}.bw (or per-track).
Raw FASTA sequences#
Predict on arbitrary DNA that isn’t tied to a reference genome. No
--fasta (reference) is needed — the sequences themselves are the input.
Because there is no genome to fetch flanks from, short sequences are
rejected outright rather than N-padded; pre-pad them yourself if you need
that.
# Sequences exactly the window size — one forward pass each
agt predict \
--model model.pth --output out/ \
--head atac --sequences window_sized.fa
# Longer sequences — tiling must be explicit
agt predict \
--model model.pth --output out/ \
--head atac --sequences long_seqs.fa --tile --crop-bp 16384
Output: one {head}_{seq_name}.npz file per sequence (with metadata).
In --json mode, stdout includes an output_files array listing the
written NPZ files.
Errors you’ll see:
Error: Sequence 'seq2' (5000bp) is shorter than the model window
(131072bp); not supported for --sequences.
Error: Sequence 'seq1' (500000bp) is longer than the model window
(131072bp); pass --tile to enable tiling.
Common options#
Flag |
Meaning |
|---|---|
|
Prediction head ( |
|
Comma-separated track indices (default: all tracks) |
|
Comma-separated output track names |
|
Output resolution in bp (128 is faster) |
|
Per-tile edge crop; use with |
|
Inference batch size (default 4) |
|
Override model input window (default 131 072) |
|
0 = human, 1 = mouse |
|
PyTorch device ( |
|
|
|
Wrap model with |
|
Finetuned checkpoint (LoRA, full, etc.) |
|
TransferConfig JSON for adapter models |
|
Keep LoRA/adapter modules separate from base weights |
|
Suppress progress bars and per-region status lines |
JSON output#
For --locus / --bed:
{
"output_files": [
{
"path": "out/atac_chr1_10000000_10131072.bw",
"head": "atac",
"chromosome": "chr1",
"start": 10000000,
"end": 10131072,
"length_bp": 131072,
"handling": "single",
"tile_count": 1,
"resolution_bp": 128
}
],
"warnings": []
}
For --bed an additional regions array lists per-region metadata
(handling, tile_count, warnings).
For --sequences:
{
"output_files": [
{
"path": "out/atac_seq1.npz",
"head": "atac",
"sequence": "seq1",
"length_bp": 500000,
"handling": "tiled",
"tile_count": 4,
"resolution_bp": 128
}
],
"warnings": []
}
agt finetune#
Training and finetuning — supports linear probing, LoRA, full finetuning, and encoder-only modes.
Requires: pip install alphagenome-pytorch[finetuning]
# Linear probing (frozen backbone)
agt finetune --mode linear-probe \
--genome hg38.fa \
--modality atac --bigwig *.bw \
--train-bed train.bed --val-bed val.bed \
--pretrained-weights model.pth \
--resolutions 1
# LoRA finetuning
agt finetune --mode lora \
--lora-rank 8 --lora-alpha 16 \
--genome hg38.fa \
--modality atac --bigwig *.bw \
--train-bed train.bed --val-bed val.bed \
--pretrained-weights model.pth \
--resolutions 1
# Encoder-only (CNN encoder, no transformer)
agt finetune --mode encoder-only \
--genome hg38.fa \
--modality atac --bigwig *.bw \
--train-bed train.bed --val-bed val.bed \
--pretrained-weights model.pth \
--sequence-length 500 --resolutions 128
# Multi-modality
agt finetune --mode lora \
--genome hg38.fa \
--modality atac --bigwig atac1.bw atac2.bw \
--modality rna_seq --bigwig rna1.bw rna2.bw \
--modality-weights atac:1.0,rna_seq:0.5 \
--train-bed train.bed --val-bed val.bed \
--pretrained-weights model.pth
agt finetune forwards arguments to the training script and currently emits
the script’s normal console logs.
agt score#
Variant effect prediction — score the impact of genetic variants on genomic tracks.
Requires: pip install alphagenome-pytorch[scoring]
# Score a single variant (format: chr:pos:ref>alt)
agt score \
--model model.pth \
--fasta hg38.fa \
--variant "chr22:36201698:A>C"
# Score variants from a VCF
agt score \
--model model.pth \
--fasta hg38.fa \
--vcf variants.vcf \
--scorer atac \
--output scores.tsv
# Score with the recommended variant scorers (default)
agt score \
--model model.pth \
--fasta hg38.fa \
--vcf variants.vcf \
--scorer recommended \
--output scores.tsv
# Score gene-centric/polyA scorers with annotations
agt score \
--model model.pth \
--fasta hg38.fa \
--gtf gencode.v49.parquet \
--polya gencode.polyas.parquet \
--variant "chr22:36201698:A>C" \
--scorer rna_seq,polyadenylation
--scorer accepts comma-separated scorer names: atac, dnase,
chip_tf, chip_histone, cage, procap, contact_maps,
rna_seq, rna_seq_active, splice_sites, splice_site_usage,
splice_junctions, and polyadenylation. The default is
recommended.
JSON output:
{
"variants": [
{
"variant": "chr22:36201698:A>C",
"interval": "chr22:36136162-36267234",
"scorer": "CenterMaskScorer(output=atac, width=501, agg=diff_log2_sum)",
"output_type": "atac",
"is_signed": true,
"gene_id": null,
"gene_name": null,
"gene_type": null,
"gene_strand": null,
"junction_start": null,
"junction_end": null,
"scores": [0.42, 0.11]
}
]
}
TSV output contains one row per scored track with columns:
variant, interval, scorer, output_type, gene_id,
gene_name, track_index, and raw_score.
agt convert#
Convert JAX AlphaGenome checkpoint to PyTorch format.
Requires: pip install alphagenome-pytorch[jax]
# Basic conversion
agt convert --input /path/to/jax/checkpoint --output model.pth
# Convert to safetensors format
agt convert --input /path/to/jax/checkpoint --output model.safetensors
JSON output:
{
"output": "model.pth",
"format": "pth",
"params_mapped": 1847,
"params_total": 1847,
"heads": ["atac", "dnase", "procap", "cage", "rna_seq", "chip_tf", "chip_histone"],
"track_means_included": true
}
agt preprocess#
Data preprocessing utilities. Each operation is a subcommand.
bigwig-to-mmap#
Convert BigWig files to memory-mapped format for fast training.
agt preprocess bigwig-to-mmap \
--input "*.bw" \
--output training_data/ \
--genome hg38.fa \
--resolution 128
JSON output:
{
"output_files": [
{"path": "training_data/sample1.mmap", "tracks": 1, "size_mb": 234.5}
],
"records_processed": 12345
}
scale-bigwig#
Normalize BigWig signal to a target total (e.g. 100M reads). Useful for making tracks comparable before training or visualization.
The --target flag accepts human-readable suffixes: 100M, 50M,
100k, etc.
# Scale a single file to 100M total signal
agt preprocess scale-bigwig \
--input sample.bw \
--output sample_scaled.bw \
--target 100M
# Scale multiple files
agt preprocess scale-bigwig \
--input "*.bw" \
--output scaled/ \
--target 100M
# Just compute the scale factor without writing output
agt preprocess scale-bigwig \
--input sample.bw \
--target 100M \
--dry-run
JSON output:
{
"files": [
{
"input": "sample.bw",
"output": "scaled/sample.bw",
"original_total": 287453120.0,
"target_total": 100000000.0,
"scale_factor": 0.3479
}
]
}
--dry-run returns the same JSON but skips writing output files.
agt serve#
Serve AlphaGenome predictions/scoring locally over gRPC and/or REST.
Requires: pip install alphagenome-pytorch[serving]
# gRPC on 127.0.0.1:50051
agt serve \
--weights model.pth \
--fasta hg38.fa
# gRPC + REST
agt serve \
--weights model.pth \
--fasta hg38.fa \
--rest-port 8080
Common options:
Flag |
Meaning |
|---|---|
|
PyTorch weights file. |
|
Reference FASTA file. |
|
Optional gene annotations for gene-centric scoring. |
|
Optional polyA annotations. |
|
Optional parquet metadata for output metadata endpoints. |
|
Bind host (default: |
|
gRPC port (default: |
|
Disable gRPC serving. |
|
Enable REST serving on this port. |
|
Torch device (default: |
Dependency Gating#
Each subcommand checks for its required optional dependencies at runtime and prints an actionable error message if they are missing:
$ agt predict --model model.pth --fasta hg38.fa
Error: 'agt predict' requires additional dependencies.
Install them with: pip install alphagenome-pytorch[inference]