Command-Line Interface (agt)

Command-Line Interface (`agt`)#

AlphaGenome PyTorch ships a CLI called agt — short for AlphaGenome Torch (and three of the four nucleotides).

After installing the package the command is available globally:

pip install alphagenome-pytorch
pip install alphagenome-pytorch[inference]  # + predict
pip install alphagenome-pytorch[finetuning] # + finetune
pip install alphagenome-pytorch[scoring]    # + score
pip install alphagenome-pytorch[serving]    # + serve

Global options#

agt [--json] <command> [options]

--json

Machine-readable JSON output on stdout for commands that support structured output. Suppresses progress bars and human formatting where implemented.

Errors produce a JSON object on stderr with a nonzero exit code:

{"error": "FileNotFoundError", "message": "No such file: model.pth"}

`agt info`#

Inspect the model architecture, available heads, track metadata, and the contents of a weights file.

Static information (no weights file needed)#

# Overview — heads, track counts per organism, resolutions
agt info

# List all heads with track counts
agt info --heads

Example output:

Head                   Tracks (human) Tracks (mouse) Dimension Resolutions
atac                              167             18       256 1bp, 128bp
dnase                             305             67       384 1bp, 128bp
procap                             12              0       128 1bp, 128bp
cage                              546            188       640 1bp, 128bp
rna_seq                           667            173       768 1bp, 128bp
chip_tf                          1617            127      1664 128bp
chip_histone                     1116            183      1152 128bp
contact_maps                       28              8        28 64x64
splice_sites                        4              4         5 1bp
splice_junctions                  367             90       734 pairwise
splice_site_usage                 734            180       734 1bp

Tracks = real (non-padding) tracks per organism. Dimension = tensor channel size (includes padding).

# List individual tracks for a head
agt info --tracks atac
agt info --tracks atac --organism mouse

# Search tracks by name or metadata
agt info --tracks atac --search K562
agt info --tracks atac --filter "biosample_name=liver"

Example output:

Head: atac | 167 tracks / 256 dimension (89 padding) | human

  #   Track Name                     Biosample       Ontology
  0   ENCSR637XSC ATAC-seq           K562            EFO:0002067
  1   ENCSR868FGM ATAC-seq           HepG2           EFO:0001187
  ...
166   UBERON:0015143 ATAC-seq        thymus           UBERON:0015143
      --- 89 padding tracks ---

Weights file inspection#

# Inspect a weights file — adds: file size, param count, dtype, format
agt info model.pth

# Inspect track_means for a specific head
agt info model.pth --track-means atac
agt info model.pth --track-means atac --organism human --top 10

# Validate a checkpoint — checks all keys present, shapes match
agt info model.pth --validate

# Compare two checkpoints
agt info model.pth --diff other.pth

# Inspect a delta/finetuned checkpoint
agt info delta.safetensors

JSON output#

agt --json info --heads

{
  "heads": [
    {
      "name": "atac",
      "dimension": 256,
      "tracks": {"human": 167, "mouse": 18},
      "padding": {"human": 89, "mouse": 238},
      "resolutions": ["1bp", "128bp"]
    }
  ]
}

agt --json info model.pth

{
  "file": "model.pth",
  "format": "pth",
  "file_size_mb": 1247.3,
  "total_parameters": 298542080,
  "dtype": "float32",
  "has_track_means": true,
  "heads": ["atac", "dnase", "procap", "cage", "rna_seq", "chip_tf", "chip_histone"]
}

`agt predict`#

Run the model and write predictions to disk. Four input modes:

Input mode	What it does	Output
`--chromosomes`	Full-chromosome tiling	BigWig per track
`--locus`	One genomic interval	BigWig per track
`--bed`	Many genomic regions from a BED file	BigWig per track (merged)
`--sequences`	Raw FASTA sequences (no genomic coordinates)	NPZ per sequence

--locus, --bed, and --sequences are mutually exclusive. When --bed is given, --chromosomes can additionally be passed as a chromosome filter over the BED rows (see below).

Requires: pip install alphagenome-pytorch[inference]

How size mismatches are handled#

The model expects a fixed input window of W bp, where W is set by --window-size (default: 131 072). When an input region or sequence does not match W, the CLI dispatches by mode and the --tile flag:

Mode	Input < `W`	Input == `W`	Input > `W`
`--locus` / `--bed` (default)	padded with real reference flanks (with warning)	single window	cut to center (with warning)
`--locus` / `--bed` `--tile`	padded with real reference flanks (with warning)	single window	stitched tiles
`--sequences` (default)	error (cannot fake reference context)	single window	error (pass `--tile`)
`--sequences` `--tile`	error	single window	stitched tiles
`--chromosomes`	n/a	n/a	stitched tiles (always)

Input validation#

Chromosome coordinates must be non-negative and must fit inside the chromosome. The CLI rejects invalid input up front rather than clamping silently:

Error: Invalid locus 'chr1:-100-500': start (-100) must be ≥ 0
Error: chr1:248000000-250000000: end (250000000) exceeds chromosome length (248956422)

When a short region is padded, the fitted W-bp window is also required to be in-bounds. If the region sits near a chromosome edge the window is shifted inward (never clamped to negative coordinates), and a warning describes the shift.

Per-region logging#

Each processed region prints a one-line status to stdout (suppressed under --quiet / --json), plus any warning lines to stderr:

chr2:5000-7000        (2000bp)    → padded
  WARNING: chr2:5000-7000 (2000bp) padded with reference flanks; window shifted to [0, 131072) because region sits near chromosome start.
chr3:10000000-10002000 (2000bp)   → padded
  WARNING: chr3:10000000-10002000 (2000bp) padded with reference flanks to a 131072bp window [9935464, 10066536); output covers only the region.
chr4:1000000-2000000  (1000000bp) → tiled (12 tiles)
chr4:100-131172       (131072bp)  → single
chr5:50000-1050000    (1000000bp) → cut
  WARNING: chr5:50000-1050000 (1000000bp) center-cut to chr5:484464-615536 (131072bp); pass --tile to predict the full region.

Full chromosomes#

# Predict ATAC for chr1 and chr2
agt predict \
    --model model.pth --fasta hg38.fa --output predictions/ \
    --head atac --chromosomes chr1,chr2

# Whole genome, 1bp resolution (slower), torch.compile for speed
agt predict \
    --model model.pth --fasta hg38.fa --output predictions/ \
    --head atac --chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10 \
    --resolution 1 --compile

# Reduce edge artifacts with overlapping tiles
agt predict \
    --model model.pth --fasta hg38.fa --output predictions/ \
    --head atac --chromosomes chr1 --crop-bp 32768

Locus (single interval)#

# Exactly one window — single forward pass
agt predict \
    --model model.pth --fasta hg38.fa --output out/ \
    --head atac --locus chr1:10000000-10131072

# Short locus — padded with real reference flanks
agt predict \
    --model model.pth --fasta hg38.fa --output out/ \
    --head atac --locus chr1:10000000-10005000

# Long locus (default) — center-cut to 131 072 bp, warning printed
agt predict \
    --model model.pth --fasta hg38.fa --output out/ \
    --head atac --locus chr1:10000000-11000000

# Long locus with --tile — full region predicted via stitched tiles
agt predict \
    --model model.pth --fasta hg38.fa --output out/ \
    --head atac --locus chr1:10000000-11000000 \
    --tile --crop-bp 16384

Output file name: {output}/{head}_{chrom}_{start}_{end}.bw (or per-track when multiple tracks).

BED file (many regions)#

BED columns: chrom, start, end, optional name. Lines starting with #, track, or browser are skipped. Each region is processed independently with the same size-handling rules as --locus; all region predictions are merged into a single BigWig per track (gaps between regions are left as no-data).

# Default — short regions padded, exact = single, long = cut (warns)
agt predict \
    --model model.pth --fasta hg38.fa --output out/ \
    --head atac --bed regions.bed

# --tile — long regions are stitched instead of cut
agt predict \
    --model model.pth --fasta hg38.fa --output out/ \
    --head atac --bed regions.bed --tile --crop-bp 16384

# Filter: predict only the rows on chr1 / chr2
agt predict \
    --model model.pth --fasta hg38.fa --output out/ \
    --head atac --bed regions.bed --chromosomes chr1,chr2

When --chromosomes is passed alongside --bed it acts as a whitelist filter on the BED rows — useful for re-running a per-chromosome subset without having to edit the BED. If the filter removes every row the CLI errors out.

Output file name: {output}/{head}.bw (or per-track).

Raw FASTA sequences#

Predict on arbitrary DNA that isn’t tied to a reference genome. No --fasta (reference) is needed — the sequences themselves are the input. Because there is no genome to fetch flanks from, short sequences are rejected outright rather than N-padded; pre-pad them yourself if you need that.

# Sequences exactly the window size — one forward pass each
agt predict \
    --model model.pth --output out/ \
    --head atac --sequences window_sized.fa

# Longer sequences — tiling must be explicit
agt predict \
    --model model.pth --output out/ \
    --head atac --sequences long_seqs.fa --tile --crop-bp 16384

Output: one {head}_{seq_name}.npz file per sequence (with metadata). In --json mode, stdout includes an output_files array listing the written NPZ files.

Errors you’ll see:

Error: Sequence 'seq2' (5000bp) is shorter than the model window
(131072bp); not supported for --sequences.

Error: Sequence 'seq1' (500000bp) is longer than the model window
(131072bp); pass --tile to enable tiling.

Common options#

Flag	Meaning
`--head NAME`	Prediction head (`atac`, `dnase`, `cage`, …)
`--tracks 0,1,2`	Comma-separated track indices (default: all tracks)
`--track-names`	Comma-separated output track names
`--resolution {1,128}`	Output resolution in bp (128 is faster)
`--crop-bp N`	Per-tile edge crop; use with `--tile` to reduce edge artifacts
`--batch-size N`	Inference batch size (default 4)
`--window-size N`	Override model input window (default 131 072)
`--organism {0,1}`	0 = human, 1 = mouse
`--device STR`	PyTorch device (`cuda`, `cpu`, `mps`)
`--dtype-policy`	`full_float32` (default) or `mixed_precision`
`--compile`	Wrap model with `torch.compile`
`--checkpoint PATH`	Finetuned checkpoint (LoRA, full, etc.)
`--transfer-config`	TransferConfig JSON for adapter models
`--no-merge-adapters`	Keep LoRA/adapter modules separate from base weights
`--quiet`	Suppress progress bars and per-region status lines

JSON output#

For --locus / --bed:

{
  "output_files": [
    {
      "path": "out/atac_chr1_10000000_10131072.bw",
      "head": "atac",
      "chromosome": "chr1",
      "start": 10000000,
      "end": 10131072,
      "length_bp": 131072,
      "handling": "single",
      "tile_count": 1,
      "resolution_bp": 128
    }
  ],
  "warnings": []
}

For --bed an additional regions array lists per-region metadata (handling, tile_count, warnings).

For --sequences:

{
  "output_files": [
    {
      "path": "out/atac_seq1.npz",
      "head": "atac",
      "sequence": "seq1",
      "length_bp": 500000,
      "handling": "tiled",
      "tile_count": 4,
      "resolution_bp": 128
    }
  ],
  "warnings": []
}

`agt finetune`#

Training and finetuning — supports linear probing, LoRA, full finetuning, and encoder-only modes.

Requires: pip install alphagenome-pytorch[finetuning]

# Linear probing (frozen backbone)
agt finetune --mode linear-probe \
    --genome hg38.fa \
    --modality atac --bigwig *.bw \
    --train-bed train.bed --val-bed val.bed \
    --pretrained-weights model.pth \
    --resolutions 1

# LoRA finetuning
agt finetune --mode lora \
    --lora-rank 8 --lora-alpha 16 \
    --genome hg38.fa \
    --modality atac --bigwig *.bw \
    --train-bed train.bed --val-bed val.bed \
    --pretrained-weights model.pth \
    --resolutions 1

# Encoder-only (CNN encoder, no transformer)
agt finetune --mode encoder-only \
    --genome hg38.fa \
    --modality atac --bigwig *.bw \
    --train-bed train.bed --val-bed val.bed \
    --pretrained-weights model.pth \
    --sequence-length 500 --resolutions 128

# Multi-modality
agt finetune --mode lora \
    --genome hg38.fa \
    --modality atac --bigwig atac1.bw atac2.bw \
    --modality rna_seq --bigwig rna1.bw rna2.bw \
    --modality-weights atac:1.0,rna_seq:0.5 \
    --train-bed train.bed --val-bed val.bed \
    --pretrained-weights model.pth

agt finetune forwards arguments to the training script and currently emits the script’s normal console logs.

`agt score`#

Variant effect prediction — score the impact of genetic variants on genomic tracks.

Requires: pip install alphagenome-pytorch[scoring]

# Score a single variant (format: chr:pos:ref>alt)
agt score \
    --model model.pth \
    --fasta hg38.fa \
    --variant "chr22:36201698:A>C"

# Score variants from a VCF
agt score \
    --model model.pth \
    --fasta hg38.fa \
    --vcf variants.vcf \
    --scorer atac \
    --output scores.tsv

# Score with the recommended variant scorers (default)
agt score \
    --model model.pth \
    --fasta hg38.fa \
    --vcf variants.vcf \
    --scorer recommended \
    --output scores.tsv

# Score gene-centric/polyA scorers with annotations
agt score \
    --model model.pth \
    --fasta hg38.fa \
    --gtf gencode.v49.parquet \
    --polya gencode.polyas.parquet \
    --variant "chr22:36201698:A>C" \
    --scorer rna_seq,polyadenylation

--scorer accepts comma-separated scorer names: atac, dnase, chip_tf, chip_histone, cage, procap, contact_maps, rna_seq, rna_seq_active, splice_sites, splice_site_usage, splice_junctions, and polyadenylation. The default is recommended.

JSON output:

{
  "variants": [
    {
      "variant": "chr22:36201698:A>C",
      "interval": "chr22:36136162-36267234",
      "scorer": "CenterMaskScorer(output=atac, width=501, agg=diff_log2_sum)",
      "output_type": "atac",
      "is_signed": true,
      "gene_id": null,
      "gene_name": null,
      "gene_type": null,
      "gene_strand": null,
      "junction_start": null,
      "junction_end": null,
      "scores": [0.42, 0.11]
    }
  ]
}

TSV output contains one row per scored track with columns: variant, interval, scorer, output_type, gene_id, gene_name, track_index, and raw_score.

`agt convert`#

Convert JAX AlphaGenome checkpoint to PyTorch format.

Requires: pip install alphagenome-pytorch[jax]

# Basic conversion
agt convert --input /path/to/jax/checkpoint --output model.pth

# Convert to safetensors format
agt convert --input /path/to/jax/checkpoint --output model.safetensors

JSON output:

{
  "output": "model.pth",
  "format": "pth",
  "params_mapped": 1847,
  "params_total": 1847,
  "heads": ["atac", "dnase", "procap", "cage", "rna_seq", "chip_tf", "chip_histone"],
  "track_means_included": true
}

`agt preprocess`#

Data preprocessing utilities. Each operation is a subcommand.

`bigwig-to-mmap`#

Convert BigWig files to memory-mapped format for fast training.

agt preprocess bigwig-to-mmap \
    --input "*.bw" \
    --output training_data/ \
    --genome hg38.fa \
    --resolution 128

JSON output:

{
  "output_files": [
    {"path": "training_data/sample1.mmap", "tracks": 1, "size_mb": 234.5}
  ],
  "records_processed": 12345
}

`scale-bigwig`#

Normalize BigWig signal to a target total (e.g. 100M reads). Useful for making tracks comparable before training or visualization.

The --target flag accepts human-readable suffixes: 100M, 50M, 100k, etc.

# Scale a single file to 100M total signal
agt preprocess scale-bigwig \
    --input sample.bw \
    --output sample_scaled.bw \
    --target 100M

# Scale multiple files
agt preprocess scale-bigwig \
    --input "*.bw" \
    --output scaled/ \
    --target 100M

# Just compute the scale factor without writing output
agt preprocess scale-bigwig \
    --input sample.bw \
    --target 100M \
    --dry-run

JSON output:

{
  "files": [
    {
      "input": "sample.bw",
      "output": "scaled/sample.bw",
      "original_total": 287453120.0,
      "target_total": 100000000.0,
      "scale_factor": 0.3479
    }
  ]
}

--dry-run returns the same JSON but skips writing output files.

`agt serve`#

Serve AlphaGenome predictions/scoring locally over gRPC and/or REST.

Requires: pip install alphagenome-pytorch[serving]

# gRPC on 127.0.0.1:50051
agt serve \
    --weights model.pth \
    --fasta hg38.fa

# gRPC + REST
agt serve \
    --weights model.pth \
    --fasta hg38.fa \
    --rest-port 8080

Common options:

Flag	Meaning
`--weights PATH`	PyTorch weights file.
`--fasta PATH`	Reference FASTA file.
`--gtf PATH`	Optional gene annotations for gene-centric scoring.
`--polya PATH`	Optional polyA annotations.
`--track-metadata PATH`	Optional parquet metadata for output metadata endpoints.
`--host HOST`	Bind host (default: `127.0.0.1`).
`--grpc-port N`	gRPC port (default: `50051`).
`--disable-grpc`	Disable gRPC serving.
`--rest-port N`	Enable REST serving on this port.
`--device STR`	Torch device (default: `cpu`).

Dependency Gating#

Each subcommand checks for its required optional dependencies at runtime and prints an actionable error message if they are missing:

$ agt predict --model model.pth --fasta hg38.fa
Error: 'agt predict' requires additional dependencies.
Install them with: pip install alphagenome-pytorch[inference]

Command-Line Interface (agt)

Contents

Command-Line Interface (agt)#

Global options#

agt info#

Static information (no weights file needed)#

Weights file inspection#

JSON output#

agt predict#

How size mismatches are handled#

Input validation#

Per-region logging#

Full chromosomes#

Locus (single interval)#

BED file (many regions)#

Raw FASTA sequences#

Common options#

JSON output#

agt finetune#

agt score#

agt convert#

agt preprocess#

bigwig-to-mmap#

scale-bigwig#

agt serve#

Dependency Gating#

Command-Line Interface (`agt`)#

`agt info`#

`agt predict`#

`agt finetune`#

`agt score`#

`agt convert`#

`agt preprocess`#

`bigwig-to-mmap`#

`scale-bigwig`#

`agt serve`#