Command-Line Interface (agt)#

AlphaGenome PyTorch ships a CLI called agt — short for AlphaGenome Torch (and three of the four nucleotides).

After installing the package the command is available globally:

pip install alphagenome-pytorch
pip install alphagenome-pytorch[inference]  # + predict
pip install alphagenome-pytorch[finetuning] # + finetune
pip install alphagenome-pytorch[scoring]    # + score
pip install alphagenome-pytorch[serving]    # + serve

Global options#

agt [--json] <command> [options]
--json

Machine-readable JSON output on stdout for commands that support structured output. Suppresses progress bars and human formatting where implemented.

Errors produce a JSON object on stderr with a nonzero exit code:

{"error": "FileNotFoundError", "message": "No such file: model.pth"}

agt info#

Inspect the model architecture, available heads, track metadata, and the contents of a weights file.

Static information (no weights file needed)#

# Overview — heads, track counts per organism, resolutions
agt info

# List all heads with track counts
agt info --heads

Example output:

Head              Tracks (human)  Tracks (mouse)  Dimension  Resolutions
atac                         167             155        256  1bp, 128bp
dnase                        305             280        384  1bp, 128bp
procap                        12               8        128  1bp, 128bp
cage                         546             490        640  1bp, 128bp
rna_seq                      667             600        768  1bp, 128bp
chip_tf                     1617            1500       1664  128bp
chip_histone                1116            1000       1152  128bp
contact_maps                  28              28         28  64x64
splice_sites                   5               5          5  1bp
splice_junctions             734             734        734  pairwise
splice_site_usage            734             734        734  1bp

Tracks = real (non-padding) tracks per organism. Dimension = tensor channel size (includes padding).

# List individual tracks for a head
agt info --tracks atac
agt info --tracks atac --organism mouse

# Search tracks by name or metadata
agt info --tracks atac --search K562
agt info --tracks atac --filter "biosample_name=liver"

Example output:

Head: atac | 167 tracks / 256 dimension (89 padding) | human

  #   Track Name                     Biosample       Ontology
  0   ENCSR637XSC ATAC-seq           K562            EFO:0002067
  1   ENCSR868FGM ATAC-seq           HepG2           EFO:0001187
  ...
166   UBERON:0015143 ATAC-seq        thymus           UBERON:0015143
      --- 89 padding tracks ---

Weights file inspection#

# Inspect a weights file — adds: file size, param count, dtype, format
agt info model.pth

# Inspect track_means for a specific head
agt info model.pth --track-means atac
agt info model.pth --track-means atac --organism human --top 10

# Validate a checkpoint — checks all keys present, shapes match
agt info model.pth --validate

# Compare two checkpoints
agt info model.pth --diff other.pth

# Inspect a delta/finetuned checkpoint
agt info delta.safetensors

JSON output#

agt --json info --heads
{
  "heads": [
    {
      "name": "atac",
      "dimension": 256,
      "tracks": {"human": 167, "mouse": 155},
      "padding": {"human": 89, "mouse": 101},
      "resolutions": ["1bp", "128bp"]
    }
  ]
}
agt --json info model.pth
{
  "file": "model.pth",
  "format": "pth",
  "file_size_mb": 1247.3,
  "total_parameters": 298542080,
  "dtype": "float32",
  "has_track_means": true,
  "heads": ["atac", "dnase", "procap", "cage", "rna_seq", "chip_tf", "chip_histone"]
}

agt predict#

Run the model and write predictions to disk. Four input modes:

Input mode

What it does

Output

--chromosomes

Full-chromosome tiling

BigWig per track

--locus

One genomic interval

BigWig per track

--bed

Many genomic regions from a BED file

BigWig per track (merged)

--sequences

Raw FASTA sequences (no genomic coordinates)

NPZ per sequence

--locus, --bed, and --sequences are mutually exclusive. When --bed is given, --chromosomes can additionally be passed as a chromosome filter over the BED rows (see below).

Requires: pip install alphagenome-pytorch[inference]

How size mismatches are handled#

The model expects a fixed input window of W bp, where W is set by --window-size (default: 131 072). When an input region or sequence does not match W, the CLI dispatches by mode and the --tile flag:

Mode

Input < W

Input == W

Input > W

--locus / --bed (default)

padded with real reference flanks (with warning)

single window

cut to center (with warning)

--locus / --bed --tile

padded with real reference flanks (with warning)

single window

stitched tiles

--sequences (default)

error (cannot fake reference context)

single window

error (pass --tile)

--sequences --tile

error

single window

stitched tiles

--chromosomes

n/a

n/a

stitched tiles (always)

Input validation#

Chromosome coordinates must be non-negative and must fit inside the chromosome. The CLI rejects invalid input up front rather than clamping silently:

Error: Invalid locus 'chr1:-100-500': start (-100) must be ≥ 0
Error: chr1:248000000-250000000: end (250000000) exceeds chromosome length (248956422)

When a short region is padded, the fitted W-bp window is also required to be in-bounds. If the region sits near a chromosome edge the window is shifted inward (never clamped to negative coordinates), and a warning describes the shift.

Per-region logging#

Each processed region prints a one-line status to stdout (suppressed under --quiet / --json), plus any warning lines to stderr:

chr2:5000-7000        (2000bp)    → padded
  WARNING: chr2:5000-7000 (2000bp) padded with reference flanks; window shifted to [0, 131072) because region sits near chromosome start.
chr3:10000000-10002000 (2000bp)   → padded
  WARNING: chr3:10000000-10002000 (2000bp) padded with reference flanks to a 131072bp window [9935464, 10066536); output covers only the region.
chr4:1000000-2000000  (1000000bp) → tiled (12 tiles)
chr4:100-131172       (131072bp)  → single
chr5:50000-1050000    (1000000bp) → cut
  WARNING: chr5:50000-1050000 (1000000bp) center-cut to chr5:484464-615536 (131072bp); pass --tile to predict the full region.

Full chromosomes#

# Predict ATAC for chr1 and chr2
agt predict \
    --model model.pth --fasta hg38.fa --output predictions/ \
    --head atac --chromosomes chr1,chr2

# Whole genome, 1bp resolution (slower), torch.compile for speed
agt predict \
    --model model.pth --fasta hg38.fa --output predictions/ \
    --head atac --chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10 \
    --resolution 1 --compile

# Reduce edge artifacts with overlapping tiles
agt predict \
    --model model.pth --fasta hg38.fa --output predictions/ \
    --head atac --chromosomes chr1 --crop-bp 32768

Locus (single interval)#

# Exactly one window — single forward pass
agt predict \
    --model model.pth --fasta hg38.fa --output out/ \
    --head atac --locus chr1:10000000-10131072

# Short locus — padded with real reference flanks
agt predict \
    --model model.pth --fasta hg38.fa --output out/ \
    --head atac --locus chr1:10000000-10005000

# Long locus (default) — center-cut to 131 072 bp, warning printed
agt predict \
    --model model.pth --fasta hg38.fa --output out/ \
    --head atac --locus chr1:10000000-11000000

# Long locus with --tile — full region predicted via stitched tiles
agt predict \
    --model model.pth --fasta hg38.fa --output out/ \
    --head atac --locus chr1:10000000-11000000 \
    --tile --crop-bp 16384

Output file name: {output}/{head}_{chrom}_{start}_{end}.bw (or per-track when multiple tracks).

BED file (many regions)#

BED columns: chrom, start, end, optional name. Lines starting with #, track, or browser are skipped. Each region is processed independently with the same size-handling rules as --locus; all region predictions are merged into a single BigWig per track (gaps between regions are left as no-data).

# Default — short regions padded, exact = single, long = cut (warns)
agt predict \
    --model model.pth --fasta hg38.fa --output out/ \
    --head atac --bed regions.bed

# --tile — long regions are stitched instead of cut
agt predict \
    --model model.pth --fasta hg38.fa --output out/ \
    --head atac --bed regions.bed --tile --crop-bp 16384

# Filter: predict only the rows on chr1 / chr2
agt predict \
    --model model.pth --fasta hg38.fa --output out/ \
    --head atac --bed regions.bed --chromosomes chr1,chr2

When --chromosomes is passed alongside --bed it acts as a whitelist filter on the BED rows — useful for re-running a per-chromosome subset without having to edit the BED. If the filter removes every row the CLI errors out.

Output file name: {output}/{head}.bw (or per-track).

Raw FASTA sequences#

Predict on arbitrary DNA that isn’t tied to a reference genome. No --fasta (reference) is needed — the sequences themselves are the input. Because there is no genome to fetch flanks from, short sequences are rejected outright rather than N-padded; pre-pad them yourself if you need that.

# Sequences exactly the window size — one forward pass each
agt predict \
    --model model.pth --output out/ \
    --head atac --sequences window_sized.fa

# Longer sequences — tiling must be explicit
agt predict \
    --model model.pth --output out/ \
    --head atac --sequences long_seqs.fa --tile --crop-bp 16384

Output: one {head}_{seq_name}.npz file per sequence (with metadata). In --json mode, stdout includes an output_files array listing the written NPZ files.

Errors you’ll see:

Error: Sequence 'seq2' (5000bp) is shorter than the model window
(131072bp); not supported for --sequences.

Error: Sequence 'seq1' (500000bp) is longer than the model window
(131072bp); pass --tile to enable tiling.

Common options#

Flag

Meaning

--head NAME

Prediction head (atac, dnase, cage, …)

--tracks 0,1,2

Comma-separated track indices (default: all tracks)

--track-names

Comma-separated output track names

--resolution {1,128}

Output resolution in bp (128 is faster)

--crop-bp N

Per-tile edge crop; use with --tile to reduce edge artifacts

--batch-size N

Inference batch size (default 4)

--window-size N

Override model input window (default 131 072)

--organism {0,1}

0 = human, 1 = mouse

--device STR

PyTorch device (cuda, cpu, mps)

--dtype-policy

full_float32 (default) or mixed_precision

--compile

Wrap model with torch.compile

--checkpoint PATH

Finetuned checkpoint (LoRA, full, etc.)

--transfer-config

TransferConfig JSON for adapter models

--no-merge-adapters

Keep LoRA/adapter modules separate from base weights

--quiet

Suppress progress bars and per-region status lines

JSON output#

For --locus / --bed:

{
  "output_files": [
    {
      "path": "out/atac_chr1_10000000_10131072.bw",
      "head": "atac",
      "chromosome": "chr1",
      "start": 10000000,
      "end": 10131072,
      "length_bp": 131072,
      "handling": "single",
      "tile_count": 1,
      "resolution_bp": 128
    }
  ],
  "warnings": []
}

For --bed an additional regions array lists per-region metadata (handling, tile_count, warnings).

For --sequences:

{
  "output_files": [
    {
      "path": "out/atac_seq1.npz",
      "head": "atac",
      "sequence": "seq1",
      "length_bp": 500000,
      "handling": "tiled",
      "tile_count": 4,
      "resolution_bp": 128
    }
  ],
  "warnings": []
}

agt finetune#

Training and finetuning — supports linear probing, LoRA, full finetuning, and encoder-only modes.

Requires: pip install alphagenome-pytorch[finetuning]

# Linear probing (frozen backbone)
agt finetune --mode linear-probe \
    --genome hg38.fa \
    --modality atac --bigwig *.bw \
    --train-bed train.bed --val-bed val.bed \
    --pretrained-weights model.pth \
    --resolutions 1

# LoRA finetuning
agt finetune --mode lora \
    --lora-rank 8 --lora-alpha 16 \
    --genome hg38.fa \
    --modality atac --bigwig *.bw \
    --train-bed train.bed --val-bed val.bed \
    --pretrained-weights model.pth \
    --resolutions 1

# Encoder-only (CNN encoder, no transformer)
agt finetune --mode encoder-only \
    --genome hg38.fa \
    --modality atac --bigwig *.bw \
    --train-bed train.bed --val-bed val.bed \
    --pretrained-weights model.pth \
    --sequence-length 500 --resolutions 128

# Multi-modality
agt finetune --mode lora \
    --genome hg38.fa \
    --modality atac --bigwig atac1.bw atac2.bw \
    --modality rna_seq --bigwig rna1.bw rna2.bw \
    --modality-weights atac:1.0,rna_seq:0.5 \
    --train-bed train.bed --val-bed val.bed \
    --pretrained-weights model.pth

agt finetune forwards arguments to the training script and currently emits the script’s normal console logs.

agt score#

Variant effect prediction — score the impact of genetic variants on genomic tracks.

Requires: pip install alphagenome-pytorch[scoring]

# Score a single variant (format: chr:pos:ref>alt)
agt score \
    --model model.pth \
    --fasta hg38.fa \
    --variant "chr22:36201698:A>C"

# Score variants from a VCF
agt score \
    --model model.pth \
    --fasta hg38.fa \
    --vcf variants.vcf \
    --scorer atac \
    --output scores.tsv

# Score with the recommended variant scorers (default)
agt score \
    --model model.pth \
    --fasta hg38.fa \
    --vcf variants.vcf \
    --scorer recommended \
    --output scores.tsv

# Score gene-centric/polyA scorers with annotations
agt score \
    --model model.pth \
    --fasta hg38.fa \
    --gtf gencode.v49.parquet \
    --polya gencode.polyas.parquet \
    --variant "chr22:36201698:A>C" \
    --scorer rna_seq,polyadenylation

--scorer accepts comma-separated scorer names: atac, dnase, chip_tf, chip_histone, cage, procap, contact_maps, rna_seq, rna_seq_active, splice_sites, splice_site_usage, splice_junctions, and polyadenylation. The default is recommended.

JSON output:

{
  "variants": [
    {
      "variant": "chr22:36201698:A>C",
      "interval": "chr22:36136162-36267234",
      "scorer": "CenterMaskScorer(output=atac, width=501, agg=diff_log2_sum)",
      "output_type": "atac",
      "is_signed": true,
      "gene_id": null,
      "gene_name": null,
      "gene_type": null,
      "gene_strand": null,
      "junction_start": null,
      "junction_end": null,
      "scores": [0.42, 0.11]
    }
  ]
}

TSV output contains one row per scored track with columns: variant, interval, scorer, output_type, gene_id, gene_name, track_index, and raw_score.

agt convert#

Convert JAX AlphaGenome checkpoint to PyTorch format.

Requires: pip install alphagenome-pytorch[jax]

# Basic conversion
agt convert --input /path/to/jax/checkpoint --output model.pth

# Convert to safetensors format
agt convert --input /path/to/jax/checkpoint --output model.safetensors

JSON output:

{
  "output": "model.pth",
  "format": "pth",
  "params_mapped": 1847,
  "params_total": 1847,
  "heads": ["atac", "dnase", "procap", "cage", "rna_seq", "chip_tf", "chip_histone"],
  "track_means_included": true
}

agt preprocess#

Data preprocessing utilities. Each operation is a subcommand.

bigwig-to-mmap#

Convert BigWig files to memory-mapped format for fast training.

agt preprocess bigwig-to-mmap \
    --input "*.bw" \
    --output training_data/ \
    --genome hg38.fa \
    --resolution 128

JSON output:

{
  "output_files": [
    {"path": "training_data/sample1.mmap", "tracks": 1, "size_mb": 234.5}
  ],
  "records_processed": 12345
}

scale-bigwig#

Normalize BigWig signal to a target total (e.g. 100M reads). Useful for making tracks comparable before training or visualization.

The --target flag accepts human-readable suffixes: 100M, 50M, 100k, etc.

# Scale a single file to 100M total signal
agt preprocess scale-bigwig \
    --input sample.bw \
    --output sample_scaled.bw \
    --target 100M

# Scale multiple files
agt preprocess scale-bigwig \
    --input "*.bw" \
    --output scaled/ \
    --target 100M

# Just compute the scale factor without writing output
agt preprocess scale-bigwig \
    --input sample.bw \
    --target 100M \
    --dry-run

JSON output:

{
  "files": [
    {
      "input": "sample.bw",
      "output": "scaled/sample.bw",
      "original_total": 287453120.0,
      "target_total": 100000000.0,
      "scale_factor": 0.3479
    }
  ]
}

--dry-run returns the same JSON but skips writing output files.

agt serve#

Serve AlphaGenome predictions/scoring locally over gRPC and/or REST.

Requires: pip install alphagenome-pytorch[serving]

# gRPC on 127.0.0.1:50051
agt serve \
    --weights model.pth \
    --fasta hg38.fa

# gRPC + REST
agt serve \
    --weights model.pth \
    --fasta hg38.fa \
    --rest-port 8080

Common options:

Flag

Meaning

--weights PATH

PyTorch weights file.

--fasta PATH

Reference FASTA file.

--gtf PATH

Optional gene annotations for gene-centric scoring.

--polya PATH

Optional polyA annotations.

--track-metadata PATH

Optional parquet metadata for output metadata endpoints.

--host HOST

Bind host (default: 127.0.0.1).

--grpc-port N

gRPC port (default: 50051).

--disable-grpc

Disable gRPC serving.

--rest-port N

Enable REST serving on this port.

--device STR

Torch device (default: cpu).

Dependency Gating#

Each subcommand checks for its required optional dependencies at runtime and prints an actionable error message if they are missing:

$ agt predict --model model.pth --fasta hg38.fa
Error: 'agt predict' requires additional dependencies.
Install them with: pip install alphagenome-pytorch[inference]