AI Debate Engine

Don't ask AI.
Cross-examine it.

Not a chatbot. Not a side-by-side comparison. A deliberation engine where AI models cross-examine, challenge sources, reject weak proposals, and only reach consensus when they've actually earned it.

41
Engine Files
19
Debate Shapes
2
Adversarial Layers
0
Scripted Outcomes
↓ scroll
The Problem

Every AI tool gives you an answer.
Nobody checks if it's the answer.

Every other AI tool
Ask ChatGPT — get one perspective
Ask Gemini — get a different one
Compare them yourself
Decide who to believe
No adversarial pressure on the answer
No accountability
Debating Bots
Models argue opposite sides with evidence
Cross-examination exposes weak points
A neutral judge challenges unsupported claims
Consensus only when both sides agree
Web search gives models real-time data
Salience tracking keeps focus on what matters

Every debate finds its own shape.

The engine doesn't follow a script. It reacts to what the models actually say — escalating when they disagree, resolving when they converge. No two debates take the same path.

01
Quick Consensus
Both models agree after cross-examination. Proposal accepted on first vote. Rare — only when genuinely warranted.
02
Revision Loop
Opponent rejects with specific fixes. Proposer revises. Accepted on second or third attempt. The most common path.
03
Counter-Proposal
After 3 rejections without alternatives, the rejecter is forced to propose. Easy to criticize — now show us yours.
04
Dual Vote Deadlock
Both models propose, both reject each other. Genuine disagreement. Mutual voting until one side gives or the judge steps in.
05
Judge Synthesis
After exhausting negotiation, a neutral judge reads both positions and writes a ruling neither model would have produced alone.
06
Judge Challenge
Mid-debate, the judge catches an unsupported claim or logical flaw. The challenged model must address it before consensus is possible. Keeps arguments honest.

Inspired by "AI Safety via Debate" (Irving, Christiano & Amodei, 2018), which proposed that two AI agents debating adversarially produce more truthful answers than either could alone.

Adversarial Layers

Two layers of scrutiny.
Neither model controls both.

1
Cross-Examination
After opening arguments, each model writes one probing question for the other — then both must answer. This isn't performative. The questions target the weakest link in the opponent's reasoning, and the answers become part of the record that the judge evaluates.
Pre-consensus · Adversarial · Mutual
2
Judge Challenge
A separate reasoning model independently reviews each turn for unsupported claims, logical fallacies, misrepresented sources, or ignored counterarguments. If it finds something, it issues a formal challenge. The challenged model cannot achieve consensus until it addresses the problem — and salience checkpoints track what's been agreed versus what's still contested.
Per-turn · Independent · Logic + Evidence

Five backstops prevent
every failure mode.

The engine is designed for genuine disagreement. Each backstop catches a specific failure and escalates to the next. They fire in order, and each one protects against the previous one being insufficient.

Mandatory review format
No rubber stamps
Models must write a structured review — strength, weakness, missing point — before they're allowed to vote. A bare "I agree" is rejected by the parser. Forces real engagement with the proposal.
3 one-sided rejections
Forced counter-proposal
If the same model rejects three times without proposing anything better, the system forces them to write their own solution. It's easy to say no. Now build something.
Dual proposals exist
Mutual voting
Both models vote on each other's proposals simultaneously. Four possible outcomes per round — one wins, both agree, or mutual rejection continues the fight.
5 total rejections
Judge synthesis
After exhausting all negotiation, a neutral judge reads both positions and creates a new answer. Neither model claims victory — the judge builds something from the best of both.
Turn 20
Hard limit
Absolute ceiling. If nothing else has resolved the debate, the judge issues a final ruling. Prevents infinite loops and runaway costs. In practice, almost never reached.
Under The Hood

A real state machine,
not a prompt chain.

41
Engine Files
8,300+
Lines of Logic
19
Debate Shapes
5
Backstops
4
AI Providers
Possible Paths

The debate engine is a numbered-step state machine. Each turn, models respond in parallel via server-sent events with real-time streaming to your browser. The engine tracks rejection counts, challenge state, budget consumption, and convergence — reacting dynamically to what the models actually produce.

Provider-agnostic by design. Each debater can be any combination of OpenAI GPT, xAI Grok, Google Gemini, or Anthropic Claude. The judge is a separate model with no allegiance to either side. Real-time cost tracking keeps every debate within budget, and models have live web search so they argue with current data, not stale training knowledge.

Watch it think.

A real debate about consciousness. Two AI models with genuinely irreconcilable philosophical positions, fighting through backstops, judge challenges, and status checks until the judge steps in.

0:00
START "Is consciousness purely emergent from physical brain processes?" — Grok argues yes, GPT argues no.
0:33
CROSS-EXAM Models probe each other's positions. Grok challenges the "hard problem." GPT challenges reductive physicalism.
1:40
JUDGE PICK Judge evaluates both proposals. Picks GPT's as stronger starting point. Grok must review and vote.
1:46
REJECT ×1 Grok rejects: "Overstates the hard problem as ontological rather than epistemic."
"Fails to grapple with the unsolved combination problem in panpsychist views, which multiplies entities without empirical support."
2:27
REJECT ×2, ×3 GPT revises twice. Grok reviews each revision in detail. Rejects both — same core disagreement.
3:27
BACKSTOP 2 Three one-sided rejections. Engine forces Grok to counter-propose. "You keep saying no — show us something better."
3:55
JUDGE CHALLENGE Judge catches Grok citing a paper that doesn't support its claimed conclusion. Grok must address the misrepresentation before proceeding.
4:08
STATUS CHECK Salience checkpoint: "Agreed — consciousness correlates with physical processes. Contested — whether subjective experience reduces to physical description."
4:40
DUAL REJECT ×4, ×5 Both models now vote on each other's proposals. Both reject. Twice. Genuine philosophical deadlock.
5:07
BACKSTOP 4 Five total rejections. Escape valve triggers. Judge reads both positions and synthesizes a ruling.
"Consciousness is very strongly evidenced to be realized by physical brain processes, but it is not yet established that subjective experience is fully reducible to physical description."
5:21
DONE 9 turns. 4 backstops triggered. $0.37 total. An answer neither model would have written alone.
Pricing

Pay per debate.
Not per month.

No subscriptions. No tiers to choose from — the engine sizes your debate automatically based on complexity. You pay once, and only for what's used.

Text & Small Files
Up to ~25 pages
$0.69
Medium Files
Up to ~75 pages
$1.99
Large Files
Up to ~200 pages
$3.99
Extra Large
Up to ~400 pages
$6.99
Every debate includes
Load credits via Stripe · $5, $10, $25, or $50 · No expiration

Not one answer.
The tested answer.

AI models are confident. They're articulate. They're often wrong. The only reliable way to find the truth is the same way humans have always done it — put two smart minds in a room and let them argue until what's left is what actually holds up.

Asking one model to double-check itself searches the same training data twice. Four companies means four different datasets — different blind spots, different gaps. What one misses, another was trained on.

Start a Debate
System Architect
Brandon Geisel
Founder & Multi-Model AI Architect

Leading the development of cross-model deliberation systems — orchestrating structured debate between AI models from competing providers (OpenAI, Google, Anthropic, xAI) through a single, unified interface.

South Bend, IN 🇺🇸