AI Debate Engine

Structured debate
between competing
AI providers.

Debating Bots pits frontier models from different providers against each other inside a structured debate. They argue opposite sides, cross-examine weak claims, revise under pressure, and only converge when an independent judge is satisfied. When you want a fast comparison, Ask All shows multiple labs side by side. When you want rigor, Debate makes the models earn the answer.

Start a Debate See How It Works

AI Providers

Layers of Scrutiny

Backstops

∞

Debate Paths

The Problem

Every AI tool gives you an answer.
Nobody checks if it's the answer.

Every other AI tool

Ask ChatGPT — get one perspective

Ask Gemini — get a different one

Compare them yourself

Decide who to believe

No adversarial pressure on the answer

No accountability

Debating Bots

Two models argue opposite sides with evidence

Cross-examination exposes weak points

An independent judge challenges unsupported claims

Models vote, revise, and merge until the answer holds up

Live web search gives models real-time data

You get the answer that survived — not the first one generated

How It Works

Debates follow structure, not scripts.

The engine follows a real structure, but it does not force canned outcomes. It reacts to what the models actually say — escalating when they disagree, forcing revisions when criticism lands, and always driving toward a final answer.

Quick Consensus

Both models agree after opening arguments. Proposal accepted on first vote. Rare — only when genuinely warranted.

→

Revision Loop

Opponent rejects with specific fixes. Proposer revises. Accepted on second or third attempt. The most common path.

→

Counter-Proposal

After repeated rejections without alternatives, the rejecter is forced to propose. Easy to criticize — now show us yours.

→

Merge Round

Both proposals pass simultaneously — an agreement collision. The judge merges the strongest parts of each into a unified draft. Both models then vote on the merged version.

→

Judge Synthesis

After exhausting negotiation, the judge reads both positions and writes a ruling neither model would have produced alone.

→

Judge Challenge

Mid-debate, the judge catches an unsupported claim or logical flaw. The challenged model must address it before consensus is possible. Keeps arguments honest.

→

Inspired by "AI Safety via Debate" (Irving, Christiano & Amodei, 2018), which proposed that two AI agents debating adversarially produce more truthful answers than either could alone.

Three Layers of Scrutiny

Three layers of scrutiny.
No model controls all three.

Devil's Advocate Positions

Before the debate begins, the engine assigns each model an opposing position to defend. Positions are randomly swapped so neither side gets a structural advantage. Models must argue their assigned position with evidence — no hedging, no "both sides." The constraint is only lifted in the endgame when it's time to collaborate on a final answer.

Pre-debate · Structural · Mandatory

Cross-Examination

After opening arguments, each model writes a probing question for the other — then both must answer. The questions target the weakest link in the opponent's reasoning, and the answers become part of the record the judge evaluates.

Post-opening · Adversarial · Mutual

Independent Judge

A separate model — from a different provider than either debater — serves as judge. It prechecks every proposal before voting begins, can challenge unsupported claims mid-debate, runs salience checkpoints to track what's agreed versus contested, and delivers the final ruling when models can't reach consensus on their own. The judge is the only entity that can override a deadlock.

Throughout · Independent · Final authority

Escalation

Six backstops keep
the debate moving.

The engine is built for genuine disagreement. Each backstop catches a common failure mode and escalates to the next layer only when needed, so the debate does not stall or collapse into noise.

Mandatory review format

No rubber stamps

Models must write a structured review before they're allowed to vote. A bare "I agree" is rejected. Forces real engagement with the proposal.

3 one-sided rejections

Forced counter-proposal

If the same model rejects three times without proposing anything better, the system forces them to write their own solution. It's easy to say no. Now build something.

Both proposals accepted simultaneously

Merge round

When both models' proposals pass at the same time, the judge merges the strongest elements of each into a unified draft. Both models then vote on the merged version — no more two-answer ambiguity.

Budget threshold reached

Endgame collaboration

Adversarial constraints are lifted. Models switch from opposing positions to collaborating on the best answer. The trigger scales with question complexity — earlier for simple questions, later for deep ones. The fight is over — now synthesize what you've learned.

Negotiation exhausted

Judge synthesis

After exhausting negotiation, the judge reads both positions and creates a new answer. Neither model claims victory — the judge builds something from the best of both.

95% budget or turn 20

Guaranteed final answer

Absolute ceiling. The judge issues a binding ruling. You always get an answer — never "the models couldn't agree."

Under The Hood

A real state machine,
not a prompt chain.

Models Per Debate

150+

State Variables

Max Turns

SSE

Live Streaming

Debate Shapes

∞

Possible Paths

The debate engine is a numbered-step state machine. Each turn, models respond in parallel via server-sent events with real-time streaming to your browser. The engine tracks rejection counts, convergence scores, budget consumption, vote state, merge rounds, and revision history — reacting dynamically to what the models actually produce.

Multi-provider by design. Each debater can come from OpenAI, Anthropic, Google, xAI, Alibaba, DeepSeek, or Mistral, and the judge is always chosen from a different provider than either debater. You pick the exact models and settings for each role, while real-time cost tracking, live web search, and code execution keep the debate grounded in current data and verifiable calculations.

Ways In

Debate when it matters.
Ask All when you're exploring.
Docs when you want to build.

Not every question needs the full engine. Sometimes you want a structured debate. Sometimes you just want to compare providers side by side. Sometimes you want to understand the developer surface before you build on top of it.

Primary

Debate

Two models argue opposing sides under a judge. Structured consensus with voting, revisions, merge rounds, and guaranteed final answer. Upload files and codebases for the models to analyze during debate.

Alpha vs Beta + independent Judge
Devil's Advocate position assignment
Manual model and settings control for every role
File upload — models browse your code via tool calls
Team Huddle — N parallel drafts synthesized into one
Structured revisions, merge rounds, and judge rulings

Casual

Ask All

Send one message to GPT, Claude, Gemini, Grok, Qwen, DeepSeek, and Mistral simultaneously. See all seven responses side by side. Multi-turn — keep the conversation going with full history.

All 7 providers in parallel
Multi-turn conversation with history
Side-by-side response comparison
Fast side-by-side provider comparison

Build

Developers / Docs

Explore the developer docs now. Hosted API access is being rolled out separately for teams that want structured multi-provider reasoning inside their own products.

Public docs for the debate engine and event model
Same structured workflow as the first-party app
Hosted API access coming soon
Built for product and workflow integration

Developers

Developer docs are public.
Hosted API access is coming soon.

The developer surface is aimed at teams that want structured multi-provider reasoning, live event streams, and independent judging inside their own workflows. The docs are public now; hosted API access is being rolled out deliberately instead of being oversold.

Public docs now

REST + SSE + structured events

Explore how the engine starts runs, streams progress, models state, and handles cancellations before hosted access opens up more broadly.

Debate lifecycle, status polling, and live server-sent events
Webhook, cancellation, and idempotency patterns
Repo zip and file context for codebase-aware debates
Structured engine concepts before provider-level overrides
Explore the developer docs ↗

Hosted access next

The long-term goal is simple: the same multi-provider debate engine, available to product teams through a clean hosted API. That rollout is separate from the BYOK launch, so the page does not pretend otherwise.

Structured Multi-provider debate, independent judging, revisions, and convergence rules stay intact in the developer surface.

Events Live progress is exposed as events so products can react while debates are still running.

Rollout Hosted access is being opened deliberately instead of pretending every path is already GA.

Docs Explore the developer docs ↗

Availability

BYOK is live.
Hosted credits are coming soon.

Today Debating Bots is live in bring-your-own-key mode. Add your own provider keys and run the structured debate engine now. Hosted credits and platform-funded usage are being rolled out separately.

Live

Bring Your Own Keys

$9.99

per month

Full access to Debate and Ask All. Add your own provider API keys — you pay providers directly at their published rates. Cancel anytime.

Soon

Hosted Credits

—

platform-funded usage

Pay-per-debate with platform credits. No API keys needed. Coming after the BYOK launch.

How BYOK works: Your subscription unlocks the platform. You add API keys from the providers you want — OpenAI, Anthropic, Google, xAI, Alibaba, DeepSeek, or Mistral. The engine calls those providers with your keys. No markup, no middleman on API costs.

BYOK subscription includes

Unlimited debates with 2 AI debaters + independent judge
Ask All — query up to 7 providers simultaneously
Opening arguments + cross-examination
Multi-round voting with revision and merge rounds
Independent judge with challenge authority
Team Huddle option — parallel drafts synthesized into one
Guaranteed final answer — always
Live web search + code execution

Bring your own provider accounts today · Hosted credits coming soon

Don't ask AI.
Cross-examine it.

AI models are confident. They're articulate. They're often wrong. The only reliable way to find the truth is the same way humans have always done it — put two smart minds in a room and let them argue until what's left is what actually holds up.

One model checking itself is still one model. Different providers bring different priors, different blind spots, and different strengths. Structured debate is what forces those differences into the open and turns them into something useful.

Start a Debate

System Architect

Brandon Geisel

Founder

Built Debating Bots solo — 40k+ lines of PHP and vanilla JS, no framework, no team. The idea: if two AI models from different providers argue a question under adversarial constraints, the answer that survives is better than what either would produce alone. The engine is the proof.

South Bend, IN 🇺🇸

Structured debatebetween competingAI providers.

Every AI tool gives you an answer.Nobody checks if it's the answer.

Debates follow structure, not scripts.

Three layers of scrutiny.No model controls all three.

Six backstops keepthe debate moving.

A real state machine,not a prompt chain.

Debate when it matters.Ask All when you're exploring.Docs when you want to build.

Developer docs are public.Hosted API access is coming soon.