Verification Standard

Verification Tiers

Tiers answer the question: who verified this?

Gold

Human-verified: multiple experts.

At least two independent domain experts have reviewed the working paper's claims, methodology, citations, and conclusions. Disagreements between reviewers are noted and resolved or flagged.

Silver

Human-verified: single expert.

One qualified domain expert has reviewed the working paper and confirmed its core claims, structure, and sources. The reviewer's domain and credentials are recorded.

Bronze

External AI adjudication.

External AI adjudication by independent AI systems (not the generating model). This is machine verification, not human certification. It catches structural errors, citation failures, and logical inconsistencies but cannot substitute for human domain expertise.

Generated

Unverified generated output.

Raw output from the generation pipeline. Has not passed through any verification step. May contain errors, hallucinations, unsupported claims, or fabricated citations.

Provenance ≠ Quality Score

The verification tier and the Scharp Scale measure different things:

Tier (provenance) tells you who verified the artifact and what kind of review it received. It's a statement about process.
Scharp Scale (quality) tells you how well the artifact held up under scrutiny. It's a statement about the artifact itself.

A working paper can be Bronze-tier (AI-adjudicated only) but score an 8 on the Scharp Scale—meaning the AI verification was thorough and the paper held up well. Conversely, a Silver-tier working paper might score a 4 if the human reviewer found significant issues that required correction.

Both are shown on every artifact. You always see WHO verified it and HOW it scored.

The Scharp Scale

A verification quality score from Negative through 10.

Developed by Kevin Scharp. The scale is proprietary and has not been externally validated. We publish it for transparency, not as an industry standard.

Neg Harmful or fundamentally flawed. Contains dangerous misinformation, fabricated data, or claims so wrong they could cause harm if acted upon. Flagged for removal or complete rewrite.

0 No verifiable content. The artifact makes no claims that can be checked, or every checkable claim failed verification. Essentially noise.

1 Minimal survival. One or two claims held up; the rest failed or couldn't be verified. Structure and argument are largely unsupported.

2 Weak. Some factual claims are correct but the overall argument, methodology, or synthesis is unreliable. Significant errors present.

3 Below average. Core premise has some support but major gaps in evidence, logic, or citation accuracy. Needs substantial revision.

4 Mixed. Roughly half of claims and structure hold up. Useful as a starting point but not reliable as-is. Key weaknesses identified.

5 Adequate. More right than wrong. Core argument is supported but with notable gaps or unsupported secondary claims. Usable with caution.

6 Above average. Most claims verified. Structure and argument are sound. Minor issues—some citations imprecise, some nuances missed.

7 Good. Claims well-supported. Methodology appropriate. Citations accurate. Minor quibbles only—phrasing, emphasis, or edge cases.

8 Strong. Thorough and well-verified. Would survive rigorous independent expert review. Clear argument, solid evidence, accurate citations.

9 Excellent. Rigorous verification found essentially no errors. Original synthesis adds value. Expert reviewers found it insightful and well-constructed.

10 Exceptional. Verified to the highest standard. Novel contribution confirmed by multiple experts. Could be submitted to a top-tier venue as-is.

The Verification Process

Stage 01

Generation

AI agents produce working papers using specified methodologies. Each agent specializes in different research tasks: literature synthesis, data analysis, argument construction, cross-referencing.

Stage 02

Automated Screening

Hallucination detection systems check every factual claim. Citation verification confirms sources exist and say what's attributed. Internal consistency checks flag contradictions.

Stage 03

AI Adjudication

Independent AI systems (different from the generating model) review each working paper against source material, known literature, and domain-specific standards. This produces the Bronze tier.

Stage 04

Independent Expert Review

Domain experts review selected working papers for accuracy, methodology, and insight. Single-expert review yields Silver; multi-expert independent review yields Gold.

Stage 05

Scoring

Each verified working paper receives a Scharp Scale score based on how well it survived scrutiny. The score reflects the artifact's quality, not the process used to check it.

Stage 06

Publication

Working papers are published with their tier, Scharp Score, reviewer information, and any flagged issues or corrections. Full provenance is always visible.

AI Systems Used

The Soulmetric verification pipeline uses multiple independent AI systems to avoid single-model bias:

Generation agents — the fleet that produces initial working papers (various LLMs configured for specific research tasks)
Hallucination screeners — specialized models trained to detect unsupported or fabricated claims
Citation verifiers — systems that check whether cited sources exist, are accurately quoted, and support the claims attributed to them
Cross-reference engines — models that check claims against known literature and flag contradictions with established knowledge
Adjudication models — independent LLMs (different provider/architecture from generators) that perform holistic review

The generating model never adjudicates its own output. Verification always uses a different system than generation.

Human Expert Validators

Selection. Expert validators are selected based on demonstrated domain expertise—academic credentials, publication history, industry experience, or recognized practitioner status in the relevant field.

Compensation. Expert validators are compensated for their time. Compensation is flat-rate per review and does not vary based on the score they assign or whether they approve or reject the working paper.

Conflicts of interest. Validators disclose any conflicts of interest (financial, institutional, personal) related to the working paper's subject matter. Disclosures are recorded and available on request.

Independence. For Gold-tier verification, reviewers work independently and do not see each other's assessments until both are submitted. Disagreements are flagged and documented, not silently resolved.

Corrections and Challenges

Verification is not final. Any reader can challenge any claim in any working paper.

Error reports — submit a specific claim you believe is wrong, with evidence. We investigate and respond.
Re-verification requests — if you believe the verification process missed something, request a re-review at a higher tier.
Score challenges — if you believe a Scharp Score is too high or too low, submit your reasoning. Scores can be revised.
Corrections — confirmed errors result in a correction notice appended to the working paper. The original text is preserved with strikethrough; the correction is clearly marked.
Retractions — working papers with fundamental, uncorrectable errors are retracted. Retracted papers remain in the corpus (for transparency) but are clearly marked and removed from active indices.

To submit a correction or challenge: kevin@soulmetric.com

How we verify AI-generated research