Generation is solved.
Verification is the new problem.

Benchmarks test what a model says on one prompt. We mine prompt trajectory space. One operator, thirty agents, two weeks — two thousand papers, twenty-two books, and results that independent experts have confirmed as novel, correct, and publishable.

~256
Research Papers
22
Books
100+
Domains
3
Expert-Validated
Read the Manifesto Browse the Corpus

The Corpus

Every paper is tiered by verification status. We do not claim unverified work is verified.

GOLD

Multi-Expert Validated — coming

Reviewed by multiple independent domain experts and confirmed as novel, correct, and of publishable quality.

SILVER

Expert-Validated — 3 papers

Reviewed by one independent domain expert and confirmed as novel, correct, and of publishable quality.