An arXiv position paper landed this week and called multi-agent safety "the most important unfinished business in LLM agent runtime assurance." That sentence is bigger than it sounds.
It says — out loud, in formal terms — that every council deployment running production traffic right now is operating in a space where the academic-tier consensus is that the safety architecture isn't finished yet. Perplexity Model Council. Microsoft Copilot Critique + Council. Anthropic Project Glasswing. KPMG Digital Gateway. Shingikai. All of us. The interface is shipping. The runtime is shipping. The thing that's supposed to guarantee none of it goes sideways at scale — that part is named and unshipped.
The paper is arXiv:2605.18672 — Bensalem and eight co-authors — Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment. The central claim: enforcing LLM agent safety inside a single abstraction layer is not merely suboptimal but categorically insufficient. Three independently-certified safety dimensions are required — semantic intent and policy compliance, environmental validity, dynamical feasibility — bound by a contract-based architecture. Multi-agent extension is named as the unfinished business.
Why this matters this week
AAMAS 2026 — the largest agents-and-multiagent-systems conference in the world — opens in Paphos in 48 hours. 1,455 Main Track submissions, the highest in 25 years of the conference. The Generative and Agentic AI area covers agency and learning in LLMs, multi-agent training of LLM agents, and cooperative and coordination of generative agents. That is the first time a top-tier agents conference has named multi-agent LLM training as a top-level program area. Imperial College London has three-agent council architectures with QBAFs on the program. Agent Contracts — a formal framework for resource-bounded autonomous AI systems — is up for oral presentation at the COINE co-located workshop.
For five days starting Monday, the academic-tier architecture-layer literature this space has been generating every week — CHAL, Cost of Consensus, Council Mode, Contestable MAD, Insider Attacks, Adaptive Consensus, Orchestration Traces RL, Coordination as Architectural Layer, and now Three-Layer Assume-Guarantee — is going to get real-time elaboration in front of an audience that actually reviews this work.
The substrate isn't two planes. It's three.
I wrote yesterday that the substrate layer has two planes — a standards plane that cooperates, and a runtime plane that competes. The standards plane held this week. The runtime plane broke. That was right as far as it went.
It went one plane short.
The Three-Layer Assume-Guarantee paper names a third plane — the safety-architecture plane. Independently certified, contract-bound, structurally required, multi-agent extension explicitly unfinished. The three planes have three different shipping cadences. Standards cooperate. Runtimes compete. The safety architecture is being engineered ahead of academic formalization, by exactly one enterprise vendor I'd point at this week.
That vendor is Kore.ai.
Kore.ai Artemis is the safety-architecture plane shipped early
Kore.ai launched the Artemis edition of its agent platform on Microsoft Azure on May 21–22. VentureBeat's framing is takes on Salesforce and ServiceNow. The three named innovations are worth reading slowly.
Agent Blueprint Language (ABL) — a compiled, declarative YAML language for defining, validating, and governing multi-agent systems. Structurally that's a standards-plane move at the enterprise-platform tier. Think AGENTS.md, Anthropic Agent Skills format, MCP server descriptors — except authored by a single platform for its own runtime.
Arch — an AI agent architect that translates business objectives into production-ready ABL and continuously refines agents using real-world production traces. That's a runtime-plane move. Antigravity Managed Agents, LangSmith Deployment, Anthropic Claude Managed Agents are the comparables.
Dual-Brain Architecture — two cognitive engines in parallel, agentic reasoning and deterministic flows, sharing memory, governed by a single runtime. That's the safety-architecture-plane move. It's the contract-based architecture from the Three-Layer Assume-Guarantee paper, except shipped to enterprise customers in May while the paper is still in submission.
Kore.ai Artemis launched on Azure. So did KPMG's Digital Gateway Powered by Claude. So does EY's EYQ. That makes Microsoft the cloud-of-record for at least three production multi-agent platform deployments at the F500 / Big-Four tier — regardless of which model lab's Chairman wins on top.
The grid no one has drawn yet
Six architectural primitives have been named in the academic-tier literature this month: heterogeneity, hierarchy, role separation, adaptive gating, arena-based argumentation with provenance, and adversary-resistant consensus. Cross those six primitives against the Three-Layer Assume-Guarantee paper's three safety dimensions — semantic intent, environmental validity, dynamical feasibility — and you have a 6 × 3 grid.
Which primitive enforces which safety dimension, at which certified layer?
The field has not drawn that grid yet. Not in a paper. Not in a product. Not in a vendor pitch deck. AAMAS week is when that grid is going to start being drawn out loud, in workshop tracks, by people who can name every cell and argue about the boundaries between them.
That's the conversation this product was built to live inside.
Where Shingikai sits
We use the standards plane — MCP-compatible interfaces to OpenRouter's 200+ models. We don't compete on the runtime plane — the runtime plane is hard enough that even Google can't ship it clean in 48 hours, and we have no business pretending otherwise. We ride above the safety-architecture plane. The strategy menu — Traditional Council, Round Robin, Survivor, Collaborative Editing, Red Team vs. Blue Team, Quick Take — is the user-selectable surface above all three planes.
Six explicit strategies. One Chairman model the Big Four already chose 3-to-1. Three substrate planes underneath, two of them shipping and one of them named-but-unshipped at the academic tier.
Standards cooperate. Runtimes break. The safety architecture is named but unshipped. Strategy is what the user picks.
Try it free — no signup. shingik.ai