AI Debate Engine

Structured debate
between competing
AI providers.

Debating Bots pits frontier models from different providers against each other inside a structured debate. They argue opposite sides, cross-examine weak claims, revise under pressure, and only converge when an independent judge is satisfied. When you want a fast comparison, Ask All shows multiple labs side by side. When you want rigor, Debate makes the models earn the answer.

Start a Debate See How It Works
7
AI Providers
3
Layers of Scrutiny
6
Backstops
Debate Paths
The Problem

Every AI tool gives you an answer.
Nobody checks if it's the answer.

Every other AI tool
Ask ChatGPT — get one perspective
Ask Gemini — get a different one
Compare them yourself
Decide who to believe
No adversarial pressure on the answer
No accountability
Debating Bots
Two models argue opposite sides with evidence
Cross-examination exposes weak points
An independent judge challenges unsupported claims
Models vote, revise, and merge until the answer holds up
Live web search gives models real-time data
You get the answer that survived — not the first one generated

Debates follow structure, not scripts.

The engine follows a real structure, but it does not force canned outcomes. It reacts to what the models actually say — escalating when they disagree, forcing revisions when criticism lands, and always driving toward a final answer.

01
Quick Consensus
Both models agree after opening arguments. Proposal accepted on first vote. Rare — only when genuinely warranted.
02
Revision Loop
Opponent rejects with specific fixes. Proposer revises. Accepted on second or third attempt. The most common path.
03
Counter-Proposal
After repeated rejections without alternatives, the rejecter is forced to propose. Easy to criticize — now show us yours.
04
Merge Round
Both proposals pass simultaneously — an agreement collision. The judge merges the strongest parts of each into a unified draft. Both models then vote on the merged version.
05
Judge Synthesis
After exhausting negotiation, the judge reads both positions and writes a ruling neither model would have produced alone.
06
Judge Challenge
Mid-debate, the judge catches an unsupported claim or logical flaw. The challenged model must address it before consensus is possible. Keeps arguments honest.

Inspired by "AI Safety via Debate" (Irving, Christiano & Amodei, 2018), which proposed that two AI agents debating adversarially produce more truthful answers than either could alone.

Three Layers of Scrutiny

Three layers of scrutiny.
No model controls all three.

1
Devil's Advocate Positions
Before the debate begins, the engine assigns each model an opposing position to defend. Positions are randomly swapped so neither side gets a structural advantage. Models must argue their assigned position with evidence — no hedging, no "both sides." The constraint is only lifted in the endgame when it's time to collaborate on a final answer.
Pre-debate · Structural · Mandatory
2
Cross-Examination
After opening arguments, each model writes a probing question for the other — then both must answer. The questions target the weakest link in the opponent's reasoning, and the answers become part of the record the judge evaluates.
Post-opening · Adversarial · Mutual
3
Independent Judge
A separate model — from a different provider than either debater — serves as judge. It prechecks every proposal before voting begins, can challenge unsupported claims mid-debate, runs salience checkpoints to track what's agreed versus contested, and delivers the final ruling when models can't reach consensus on their own. The judge is the only entity that can override a deadlock.
Throughout · Independent · Final authority

Six backstops keep
the debate moving.

The engine is built for genuine disagreement. Each backstop catches a common failure mode and escalates to the next layer only when needed, so the debate does not stall or collapse into noise.

Mandatory review format
No rubber stamps
Models must write a structured review before they're allowed to vote. A bare "I agree" is rejected. Forces real engagement with the proposal.
3 one-sided rejections
Forced counter-proposal
If the same model rejects three times without proposing anything better, the system forces them to write their own solution. It's easy to say no. Now build something.
Both proposals accepted simultaneously
Merge round
When both models' proposals pass at the same time, the judge merges the strongest elements of each into a unified draft. Both models then vote on the merged version — no more two-answer ambiguity.
Budget threshold reached
Endgame collaboration
Adversarial constraints are lifted. Models switch from opposing positions to collaborating on the best answer. The trigger scales with question complexity — earlier for simple questions, later for deep ones. The fight is over — now synthesize what you've learned.
Negotiation exhausted
Judge synthesis
After exhausting negotiation, the judge reads both positions and creates a new answer. Neither model claims victory — the judge builds something from the best of both.
95% budget or turn 20
Guaranteed final answer
Absolute ceiling. The judge issues a binding ruling. You always get an answer — never "the models couldn't agree."
Under The Hood

A real state machine,
not a prompt chain.

3
Models Per Debate
150+
State Variables
20
Max Turns
SSE
Live Streaming
6
Debate Shapes
Possible Paths

The debate engine is a numbered-step state machine. Each turn, models respond in parallel via server-sent events with real-time streaming to your browser. The engine tracks rejection counts, convergence scores, budget consumption, vote state, merge rounds, and revision history — reacting dynamically to what the models actually produce.

Multi-provider by design. Each debater can come from OpenAI, Anthropic, Google, xAI, Alibaba, DeepSeek, or Mistral, and the judge is always chosen from a different provider than either debater. You pick the exact models and settings for each role, while real-time cost tracking, live web search, and code execution keep the debate grounded in current data and verifiable calculations.

Debate when it matters.
Ask All when you're exploring.
Docs when you want to build.

Not every question needs the full engine. Sometimes you want a structured debate. Sometimes you just want to compare providers side by side. Sometimes you want to understand the developer surface before you build on top of it.

Primary
Debate
Two models argue opposing sides under a judge. Structured consensus with voting, revisions, merge rounds, and guaranteed final answer. Upload files and codebases for the models to analyze during debate.
  • Alpha vs Beta + independent Judge
  • Devil's Advocate position assignment
  • Manual model and settings control for every role
  • File upload — models browse your code via tool calls
  • Team Huddle — N parallel drafts synthesized into one
  • Structured revisions, merge rounds, and judge rulings
Casual
Ask All
Send one message to GPT, Claude, Gemini, Grok, Qwen, DeepSeek, and Mistral simultaneously. See all seven responses side by side. Multi-turn — keep the conversation going with full history.
  • All 7 providers in parallel
  • Multi-turn conversation with history
  • Side-by-side response comparison
  • Fast side-by-side provider comparison
Build
Developers / Docs
Explore the developer docs now. Hosted API access is being rolled out separately for teams that want structured multi-provider reasoning inside their own products.
  • Public docs for the debate engine and event model
  • Same structured workflow as the first-party app
  • Hosted API access coming soon
  • Built for product and workflow integration

Developer docs are public.
Hosted API access is coming soon.

The developer surface is aimed at teams that want structured multi-provider reasoning, live event streams, and independent judging inside their own workflows. The docs are public now; hosted API access is being rolled out deliberately instead of being oversold.

Public docs now

REST + SSE + structured events

Explore how the engine starts runs, streams progress, models state, and handles cancellations before hosted access opens up more broadly.

  • Debate lifecycle, status polling, and live server-sent events
  • Webhook, cancellation, and idempotency patterns
  • Repo zip and file context for codebase-aware debates
  • Structured engine concepts before provider-level overrides
  • Explore the developer docs ↗

Hosted access next

The long-term goal is simple: the same multi-provider debate engine, available to product teams through a clean hosted API. That rollout is separate from the BYOK launch, so the page does not pretend otherwise.

Structured Multi-provider debate, independent judging, revisions, and convergence rules stay intact in the developer surface.
Events Live progress is exposed as events so products can react while debates are still running.
Rollout Hosted access is being opened deliberately instead of pretending every path is already GA.
Availability

BYOK is live.
Hosted credits are coming soon.

Today Debating Bots is live in bring-your-own-key mode. Add your own provider keys and run the structured debate engine now. Hosted credits and platform-funded usage are being rolled out separately.

Live
Bring Your Own Keys
$9.99
per month
Full access to Debate and Ask All. Add your own provider API keys — you pay providers directly at their published rates. Cancel anytime.
Soon
Hosted Credits
platform-funded usage
Pay-per-debate with platform credits. No API keys needed. Coming after the BYOK launch.
How BYOK works: Your subscription unlocks the platform. You add API keys from the providers you want — OpenAI, Anthropic, Google, xAI, Alibaba, DeepSeek, or Mistral. The engine calls those providers with your keys. No markup, no middleman on API costs.
BYOK subscription includes
Bring your own provider accounts today · Hosted credits coming soon

Don't ask AI.
Cross-examine it.

AI models are confident. They're articulate. They're often wrong. The only reliable way to find the truth is the same way humans have always done it — put two smart minds in a room and let them argue until what's left is what actually holds up.

One model checking itself is still one model. Different providers bring different priors, different blind spots, and different strengths. Structured debate is what forces those differences into the open and turns them into something useful.

Start a Debate
System Architect
Brandon Geisel
Founder

Built Debating Bots solo — 40k+ lines of PHP and vanilla JS, no framework, no team. The idea: if two AI models from different providers argue a question under adversarial constraints, the answer that survives is better than what either would produce alone. The engine is the proof.

South Bend, IN 🇺🇸