Built for model evaluation

The long-horizon data layer for frontier AI

Real, long-horizon workflows from expert researchers. Refined into evaluations and benchmarks for frontier AI.

What is long-horizon data?

Real-world, long-horizon workflows captured at the source. Refined into something a model can actually learn from.

Captured directly from authentic, expert work. Each trace carries intent, outcome, and the rework that produced it.

A long-horizon research arc

  1. 1

    Question

  2. 2

    Literature review

  3. 3

    Method design

  4. 4

    Experiments

  5. 5

    Iteration

  6. 6

    Result

Research is iterative. Weeks of method design, experiments, and rework before a result lands.

How it compares

Four properties together

Real, signal-rich, verifiable, and reproducible. All at once.

CategoryRealSignal-RichVerifiableReproducible

Human Annotation

RL Environments

Raw Capture Tools

Internal Capture

How long-horizon data is refined

Sources

Data Sources

Real workflows · Real outcomes

Process

Refinery

CaptureRefineVerify

Destination

Frontier AI Labs

Benchmarks · Datasets · Outcomes

With consent · PII redacted

Let’s talk

Whether you build frontier models or work at the frontier of your field, we’d like to hear from you.

contact@openrefinery.ai