Built for model evaluation

The long-horizon data layer for frontier AI

Real, long-horizon workflows from expert researchers. Refined into evaluations and benchmarks for frontier AI.

What is long-horizon data?

Real-world, long-horizon workflows captured at the source. Refined into something a model can actually learn from.

Captured directly from authentic, expert work. Each trace carries intent, outcome, and the rework that produced it.

A long-horizon research arc

Research is iterative. Weeks of method design, experiments, and rework before a result lands.

How it compares

Real, signal-rich, verifiable, and reproducible. All at once.

Category	Real	Signal-Rich	Verifiable	Reproducible
Human Annotation	❌	❌	✅	✅
RL Environments	❌	✅	✅	✅
Raw Capture Tools	✅	❌	❌	❌
Internal Capture	✅	✅	❌	❌

Sources

Data Sources

Real workflows · Real outcomes

Process

Refinery

CaptureRefineVerify

Destination

Frontier AI Labs

Benchmarks · Datasets · Outcomes

With consent · PII redacted

Two ways to engage. Pick the side of the pipeline that matches your work.

For AI Labs

Long-horizon, verifiable workflows from frontier experts. Refined for model evaluation.

For Researchers

If your daily work produces long-horizon, verifiable workflows in a technical domain, we'd like to hear from you.

Whether you build frontier models or work at the frontier of your field, we’d like to hear from you.