Debating Bots pits frontier models from different providers against each other inside a structured debate. They argue opposite sides, cross-examine weak claims, revise under pressure, and only converge when an independent judge is satisfied. When you want a fast comparison, Ask All shows multiple labs side by side. When you want rigor, Debate makes the models earn the answer.
The engine follows a real structure, but it does not force canned outcomes. It reacts to what the models actually say — escalating when they disagree, forcing revisions when criticism lands, and always driving toward a final answer.
Inspired by "AI Safety via Debate" (Irving, Christiano & Amodei, 2018), which proposed that two AI agents debating adversarially produce more truthful answers than either could alone.
The engine is built for genuine disagreement. Each backstop catches a common failure mode and escalates to the next layer only when needed, so the debate does not stall or collapse into noise.
The debate engine is a numbered-step state machine. Each turn, models respond in parallel via server-sent events with real-time streaming to your browser. The engine tracks rejection counts, convergence scores, budget consumption, vote state, merge rounds, and revision history — reacting dynamically to what the models actually produce.
Multi-provider by design. Each debater can come from OpenAI, Anthropic, Google, xAI, Alibaba, DeepSeek, or Mistral, and the judge is always chosen from a different provider than either debater. You pick the exact models and settings for each role, while real-time cost tracking, live web search, and code execution keep the debate grounded in current data and verifiable calculations.
Not every question needs the full engine. Sometimes you want a structured debate. Sometimes you just want to compare providers side by side. Sometimes you want to understand the developer surface before you build on top of it.
The developer surface is aimed at teams that want structured multi-provider reasoning, live event streams, and independent judging inside their own workflows. The docs are public now; hosted API access is being rolled out deliberately instead of being oversold.
Explore how the engine starts runs, streams progress, models state, and handles cancellations before hosted access opens up more broadly.
The long-term goal is simple: the same multi-provider debate engine, available to product teams through a clean hosted API. That rollout is separate from the BYOK launch, so the page does not pretend otherwise.
Today Debating Bots is live in bring-your-own-key mode. Add your own provider keys and run the structured debate engine now. Hosted credits and platform-funded usage are being rolled out separately.
AI models are confident. They're articulate. They're often wrong. The only reliable way to find the truth is the same way humans have always done it — put two smart minds in a room and let them argue until what's left is what actually holds up.
One model checking itself is still one model. Different providers bring different priors, different blind spots, and different strengths. Structured debate is what forces those differences into the open and turns them into something useful.
Start a DebateBuilt Debating Bots solo — 40k+ lines of PHP and vanilla JS, no framework, no team. The idea: if two AI models from different providers argue a question under adversarial constraints, the answer that survives is better than what either would produce alone. The engine is the proof.