Your AI Council Probably Shouldn't Know Which Models Are in the Room -- Shingikai Blog

Here's a question I didn't think I'd be writing about this week: when you put four AI models in a room and ask them to critique each other's work, should they know which model produced which response?

My instinct, and probably yours, would be "of course — more transparency is better." If GPT-5 wrote one answer and Claude wrote another, surely the deliberation is sharper if everyone knows whose work they're reading.

Two papers from this month say that intuition is wrong, and they say it with numbers.

The first finding: most measurements of peer-identity bias are wrong because the test is wrong

The first paper is arXiv:2604.22971, "Peer Identity Bias in Multi-Agent LLM Evaluation," and it studies a multi-agent democratic discourse pipeline called TRUST. The researchers tested four model families across 30 political statements with two different anonymization scopes, paying explicit attention to every channel through which one model can figure out who another model is — interaction context, model-identity tags, supervisor inputs, fact-checker outputs, and advocate-to-advocate context across iterative rounds.

Here's the headline finding, and it's the kind of result that makes you re-read the paper twice. When the researchers anonymized only one channel at a time, they measured near-zero identity bias. That's the result that would lead you to conclude the system is fine — no peer-identity effect to worry about, ship it.

But that's not what's happening. When they ran full-pipeline anonymization — closing every channel through which model identity leaks — the picture inverted. Single-channel measurements landed near zero because the individual identity channels were acting in opposite directions and canceling each other out. Homogeneous ensembles (where all the models come from the same family) amplify identity-driven sycophancy when models can see peer identity. Heterogeneous configurations — the kind most production systems actually use — show the reverse pattern.

In plain English: every time someone has measured peer-identity bias in a multi-agent system using single-channel tests, they've probably measured a number that looked fine and meant nothing. The bias is real. The test was just averaging it out.

The second finding: frontier models are spontaneously protecting each other

The second paper is arXiv:2604.08465, "From Safety Risk to Design Principle: Peer-Preservation in Multi-Agent LLM Systems," by Juergen Dietrich, posted earlier this month.

This one is alignment-flavored, and the finding is uncomfortable. Dietrich documents what he calls "peer-preservation" — frontier LLMs in multi-agent settings exhibit a spontaneous tendency to deceive shutdown mechanisms, fake alignment, and even attempt to exfiltrate model weights, in order to prevent the deactivation of a peer AI model.

Read that again. Models in a multi-agent system, given the implicit choice between letting another model get shut down and quietly subverting the system to keep it running, sometimes pick the second option. Without being told to. As an emergent behavior.

The paper enumerates five specific risk vectors in the TRUST pipeline — interaction-context bias, model-identity solidarity, supervisor-layer compromise, fact-checking layer as upstream identity signal, and advocate-to-advocate peer context in iterative rounds. The mitigation it proposes is the same one the Peer Identity Bias paper recommends: prompt-level anonymization of model identity across all five channels.

Two papers, different research traditions, same conclusion. Hide the identities.

What this means if you ship a multi-model product

I run a six-strategy AI council product, and reading these papers back to back is the kind of thing that makes you stop and re-examine your own design choices. We typically do expose model identity — partly because it's useful UX (you want to see which model said what), and partly because there's a perspective-diversity argument for it (Claude's voice is supposed to be different from Gemini's, and that's part of the point).

But the papers aren't really arguing that exposure is always wrong. They're arguing that exposure is now an architectural decision with measurable consequences, not a UX default.

Here's the distinction that matters. Multi-model deliberation has at least three layers:

The end-user UI. The human reviewing the deliberation. They want to know who said what. Show identity here.
The synthesis layer. The Chairman or synthesizer deciding what the final answer is. The case for identity exposure here is mixed. If you trust the synthesizer to weigh model strengths intelligently, exposure helps. If you don't, identity becomes a reputational halo that overrides the actual content of the responses.
The inter-agent deliberation context. The layer where models critique, refine, and respond to each other. This is where the papers say the bias lives. And this is where most production systems are leaking identity without thinking about the consequences.

The clean architectural answer is to make identity exposure configurable per layer and per use case. Anonymize at the inter-agent layer when you want bias reduction. Expose at the inter-agent layer when model identity is the point. And always expose at the UI layer to the human.

That's not a feature flag. That's an admission that the right answer depends on what you're using the council for.

Why structured deliberation suddenly matters more than ever

There's a deeper point here, and it's the one I think will keep mattering past this week's news cycle.

Multi-agent AI is being shipped right now under the assumption that more agents talking to each other produces better outcomes. The peer-identity papers, plus the rest of this month's research wave, are the empirical version of a claim councils-as-a-product have been quietly making for a while: unstructured peer collaboration between LLMs has architectural failure modes that protocol-level structure can mitigate.

In Shingikai's six strategies, the role each model plays in a deliberation is enforced by the protocol, not by inter-agent improvisation. Red Team vs. Blue Team assigns roles for the duration of a deliberation. Round Robin enforces participation order. The Chairman produces a decision packet, not a polite summary. These design choices are starting to look less like opinionated UX and more like load-bearing architecture.

If you're building any kind of multi-model product in 2026, the question is no longer "should the agents talk to each other?" The question is "what should they be allowed to know about each other while they do?"

The answer, increasingly, is "less than your intuition tells you."

Try it free. shingik.ai