Debating Bots — AI That Argues Back

The Problem

Every AI tool gives you an answer.
Nobody checks if it's the answer.

Every other AI tool

Ask ChatGPT — get one perspective

Ask Gemini — get a different one

Compare them yourself

Decide who to believe

No adversarial pressure on the answer

No accountability

Debating Bots

Models argue opposite sides with evidence

Cross-examination exposes weak points

A neutral judge challenges unsupported claims

Consensus only when both sides agree

Web search gives models real-time data

Salience tracking keeps focus on what matters

How It Works

Every debate finds its own shape.

The engine doesn't follow a script. It reacts to what the models actually say — escalating when they disagree, resolving when they converge. No two debates take the same path.

Quick Consensus

Both models agree after cross-examination. Proposal accepted on first vote. Rare — only when genuinely warranted.

→

Revision Loop

Opponent rejects with specific fixes. Proposer revises. Accepted on second or third attempt. The most common path.

→

Counter-Proposal

After 3 rejections without alternatives, the rejecter is forced to propose. Easy to criticize — now show us yours.

→

Dual Vote Deadlock

Both models propose, both reject each other. Genuine disagreement. Mutual voting until one side gives or the judge steps in.

→

Judge Synthesis

After exhausting negotiation, a neutral judge reads both positions and writes a ruling neither model would have produced alone.

→

Judge Challenge

Mid-debate, the judge catches an unsupported claim or logical flaw. The challenged model must address it before consensus is possible. Keeps arguments honest.

→

Inspired by "AI Safety via Debate" (Irving, Christiano & Amodei, 2018), which proposed that two AI agents debating adversarially produce more truthful answers than either could alone.

Adversarial Layers

Two layers of scrutiny.
Neither model controls both.

Cross-Examination

After opening arguments, each model writes one probing question for the other — then both must answer. This isn't performative. The questions target the weakest link in the opponent's reasoning, and the answers become part of the record that the judge evaluates.

Pre-consensus · Adversarial · Mutual

Judge Challenge

A separate reasoning model independently reviews each turn for unsupported claims, logical fallacies, misrepresented sources, or ignored counterarguments. If it finds something, it issues a formal challenge. The challenged model cannot achieve consensus until it addresses the problem — and salience checkpoints track what's been agreed versus what's still contested.

Per-turn · Independent · Logic + Evidence

Escalation

Five backstops prevent
every failure mode.

The engine is designed for genuine disagreement. Each backstop catches a specific failure and escalates to the next. They fire in order, and each one protects against the previous one being insufficient.

Mandatory review format

No rubber stamps

Models must write a structured review — strength, weakness, missing point — before they're allowed to vote. A bare "I agree" is rejected by the parser. Forces real engagement with the proposal.

3 one-sided rejections

Forced counter-proposal

If the same model rejects three times without proposing anything better, the system forces them to write their own solution. It's easy to say no. Now build something.

Dual proposals exist

Mutual voting

Both models vote on each other's proposals simultaneously. Four possible outcomes per round — one wins, both agree, or mutual rejection continues the fight.

5 total rejections

Judge synthesis

After exhausting all negotiation, a neutral judge reads both positions and creates a new answer. Neither model claims victory — the judge builds something from the best of both.

Turn 20

Hard limit

Absolute ceiling. If nothing else has resolved the debate, the judge issues a final ruling. Prevents infinite loops and runaway costs. In practice, almost never reached.

Under The Hood

A real state machine,
not a prompt chain.

Engine Files

8,300+

Lines of Logic

Debate Shapes

Backstops

AI Providers

∞

Possible Paths

The debate engine is a numbered-step state machine. Each turn, models respond in parallel via server-sent events with real-time streaming to your browser. The engine tracks rejection counts, challenge state, budget consumption, and convergence — reacting dynamically to what the models actually produce.

Provider-agnostic by design. Each debater can be any combination of OpenAI GPT, xAI Grok, Google Gemini, or Anthropic Claude. The judge is a separate model with no allegiance to either side. Real-time cost tracking keeps every debate within budget, and models have live web search so they argue with current data, not stale training knowledge.

Real Debates

Watch it think.

A real debate about consciousness. Two AI models with genuinely irreconcilable philosophical positions, fighting through backstops, judge challenges, and status checks until the judge steps in.