Draft · Under review

The Virtual Embryo Challenge

Generative modeling of mouse embryogenesis across space, scale, and time — under genetic perturbation.

~1M
cells
11
time points
3
tasks
5
developmental stages

Embryogenesis is fundamental — and largely unmodelled

A single fertilised cell becomes a complete organism through spatiotemporally coordinated gene regulation, cell-fate transitions, tissue morphogenesis, and organ formation. Disruptions cause congenital defects, which still affect 1 in 33 newborns and remain a leading cause of infant mortality.

Large embryo atlases and spatial-transcriptomics datasets give us snapshots, but they don't reveal how cell states transition, how local molecular changes propagate to tissue- and organ-level phenotypes, or how development responds to perturbation.

The Virtual Embryo Challenge establishes a standardised benchmark for predictive embryogenesis: a curated dataset, an evaluation pipeline, baseline models, and three tasks that jointly stress spatial context, multiscale reasoning, temporal dynamics, and perturbation response.

Three tasks, one shared atlas

Each task uses staged train / validation / hidden-test splits over the same whole-embryo + heart-focused resource. Hidden labels are never released; final rankings reflect generalisation to held-out stages, embryos, and genotypes.

T1Task 1
Temporal gene-expression distribution prediction

Forecast the gene-expression distribution at unseen future stages from earlier ones.

StagesE7.75 · E8.5 · E9.5 → val E10.5 → test E12.5
Why hardModels must capture developmental trends rather than interpolate between adjacent observed stages.
T2Task 2
Spatial-temporal multiscale future prediction

Predict expression + cell-type composition + 3D spatial organization jointly across stages.

StagesLong-range: → val E10.5 → test E12.5 / Short-range: → val E7.5 → test E8.5
Why hardDistinguish models that match global expression from those that recover where cell states sit in space.
T3Task 3
Mutant perturbation prediction

Predict mutant developmental outcomes — cell-type distribution, heart morphology, gene expression — under unseen knock-outs.

StagesE8.75 with three CKO conditions (e.g. β-catenin, Mef2/12, Gata4) — one held for validation, one for test.
Why hardGeneralize from wild-type development + observed perturbations to held-out genetic contexts.

Multimodal whole-embryo perturbation resource

~1 million cells across 11 developmental time points, spanning early gastrulation through cardiac progenitor emergence, heart-tube formation, looping, and later morphogenesis.

Single-cell
sci-RNA-seq3 + Multiome

Whole-embryo per-cell expression and chromatin accessibility across staged embryos.

Spatial
3D MERFISH + Stereo-seq

Coronal sections decoded into per-cell 3D positions plus measured RNA.

Annotation
Cell-type + anatomical labels

Per-cell cell-type, tissue-domain, and anatomical-region calls plus morphology-derived features.

Perturbations
Conditional knock-outs at E8.75

Three CKO conditions across cardiac developmental regulators, with paired wild-type controls and bulk-RNA validation.

Three metrics, automatic scoring, hidden labels

Scores are computed on held-out embryos after schema validation (gene order, cell-type vocabulary, coordinate convention, missing-value policy). Sub-scores per task; an overall composite for ranking.

Gene-expression accuracy

Pseudobulk Pearson correlation per evaluation stratum (embryo / region / cell type), averaged with bootstrap confidence intervals.

Cell-type composition accuracy

A frozen probe classifier — trained by the organizers and locked before evaluation — assigns predicted-vs-observed cell-type proportions at global, regional, and per-condition levels.

Spatial organization accuracy

Fused Gromov-Wasserstein distance combining expression similarity with spatial-structure preservation. Penalises predictions that get the marginals right but the geometry wrong.

Three phases · launch → development → final

  1. 2026-06-30Site + submission portal + eval platform live
  2. 2026-07-20Starter kit released; website opens to participants
  3. 2026-07-30P1 · Test phase begins (workflow + leaderboard validation)
  4. 2026-08-15P2 · Development phase begins; validation dataset released
  5. 2026-10-25P3 · Final test phase begins (new held-out dataset)
  6. 2026-11-02Final submissions due; official evaluation starts
  7. 2026-11-18Winners announced at NeurIPS

$70K total from the Laude Institute Moonshots Seed Grant

$20K
Winner prizes

Top teams per task across both tracks.

$30K
Travel awards

15–20 grants ($1k–$2k each) for early-career attendees of the NeurIPS workshop.

$20K
Outreach & education

Website, starter-kit repo, tutorials, reproducible walkthroughs, Slack workspace, community support.

Evaluation runs on the Stanford Sherlock GPU cluster (NVIDIA H100 / H200 80 GB).

Explore the data that powers it

The challenge is grounded in the same atlas you can browse on this site: 3-D spatial-transcriptomics specimens by Theiler stage, a whole single-cell time-lapse from gastrula to birth, and the EMA anatomical references. Use them now to understand the modality coverage and stage spacing before the starter kit drops.

Hosted by the Qiu Lab, Stanford University, in collaboration with developmental-biology, computational-biology, and machine-learning communities. The full organising committee will be announced alongside the starter-kit release.

Subscribe to receive starter-kit, dataset, and timeline updates. The competition site, GitHub repository, and Slack workspace go live two weeks before P1.

Email the organisers GitHub · Slack — coming soon

This page summarises the NeurIPS 2026 competition proposal currently under review. Dates, datasets, prize amounts, and exact metric formulations are subject to change between proposal acceptance and launch.