Full Chromosome Prediction#
With AlphaGenome we can generate genome-wide predictions by tiling across entire chromosomes and stitching results into BigWig files. This comes in handy for visualising predicted signal tracks in genome browsers.
Command-Line Script#
The script scripts/predict_full_chromosome.py wraps the Python API and
writes one BigWig file per chromosome/track.
Quick Start#
# Predict ATAC track 0 for chr1 at 128bp resolution (default)
python scripts/predict_full_chromosome.py \
--model model.pth \
--fasta hg38.fa \
--output predictions/ \
--head atac \
--tracks 0 \
--chromosomes chr1
# Full genome at 1bp resolution with center cropping
python scripts/predict_full_chromosome.py \
--model model.pth \
--fasta hg38.fa \
--output predictions/ \
--head atac \
--resolution 1 \
--crop-bp 32768 \
--batch-size 2
CLI Options#
Argument |
Default |
Description |
|---|---|---|
|
(required) |
Path to model weights ( |
|
(required) |
Path to reference genome FASTA file |
|
(required) |
Output directory for BigWig files |
|
(required) |
Prediction head ( |
|
all |
Comma-separated track indices to output (e.g. |
|
|
Comma-separated names for output BigWig files |
|
|
Output resolution in bp ( |
|
|
Base pairs to crop from each window edge (e.g. |
|
|
Number of windows per inference batch |
|
|
Model input window size in bp |
|
chr1-22, chrX |
Comma-separated list of chromosomes to predict |
|
|
Organism index ( |
|
|
PyTorch device |
|
|
Dtype policy for model inference. Use |
|
off |
Suppress progress bars |
Python API#
The inference extension lives in alphagenome_pytorch.extensions.inference.
Predicting a Single Chromosome#
predict_full_chromosome()
returns predictions for one chromosome as a NumPy array:
from alphagenome_pytorch import AlphaGenome
from alphagenome_pytorch.extensions.inference import (
TilingConfig,
predict_full_chromosome,
)
model = AlphaGenome.from_pretrained("model.pth", device="cuda")
config = TilingConfig(resolution=1, batch_size=8)
preds = predict_full_chromosome(
model,
"hg38.fa",
chrom="chr1",
head="atac",
config=config,
)
# preds.shape == (chrom_length // resolution, n_tracks)
Writing BigWig Files#
predict_full_chromosomes_to_bigwig()
predicts multiple chromosomes and saves each as a BigWig:
from alphagenome_pytorch.extensions.inference import (
TilingConfig,
predict_full_chromosomes_to_bigwig,
)
config = TilingConfig(resolution=128, crop_bp=32768)
results = predict_full_chromosomes_to_bigwig(
model=model,
fasta_path="hg38.fa",
output_dir="./predictions",
head="atac",
chromosomes=["chr1", "chr2"],
config=config,
track_indices=[0, 1], # optional: subset of tracks
track_names=["sample_A", "sample_B"], # optional: BigWig names
)
# results == {'chr1': [Path('predictions/atac_chr1_sample_A.bw'), ...], ...}
Tiling Configuration#
TilingConfig controls how the
genome is split into overlapping windows:
config = TilingConfig(
window_size=131_072, # model input size (default)
crop_bp=32_768, # crop edges to reduce artefacts
resolution=128, # 128bp bins (faster) or 1 (base-pair)
batch_size=4, # windows per batch
)
Field |
Default |
Description |
|---|---|---|
|
|
Input window size in bp |
|
|
Base pairs to crop from each edge. Setting this enables overlapping windows so only the center of each window is kept, reducing edge artefacts. |
|
|
|
|
|
Number of windows processed per forward pass |
Derived properties:
effective_size— kept region per window:window_size - 2 * crop_bpstep_size— equalseffective_sizefor seamless tiling
Tip
Setting crop_bp=32768 (25% of the default 131 072 bp window) keeps the
central ~50% of each window. This is a good starting point for reducing
edge prediction artefacts.
Supported Heads#
Head |
Tracks |
Resolutions |
|---|---|---|
|
256 |
1, 128 |
|
384 |
1, 128 |
|
128 |
1, 128 |
|
640 |
1, 128 |
|
768 |
1, 128 |
|
1664 |
128 only |
|
1152 |
128 only |
Note
chip_tf and chip_histone only support 128bp resolution.
Requesting --resolution 1 with these heads will raise an error.
Performance Tips#
Use resolution 128 when 1bp resolution is not needed.
Use larger batch size (
--batch-size 8) if your GPU memory allows.For quick tests, limit chromosomes with
--chromosomes chr21,chr22.Try loading the model with mixed precision (
DtypePolicy.mixed_precision()).