Anthropic didn't release a model last week. They released a mirror.
Their "AI job destruction detector" doesn't measure intelligence. It measures exposure—how much of your job an LLM can already do, right now, without anyone asking your permission.
That's a different question. And most organizations aren't ready for it.
The Benchmark Problem Nobody Was Talking About
For years, AI progress has been measured by scores on tests no one actually takes. MMLU. HumanEval. Leaderboards that tell you a model can ace a bar exam but not whether it can survive your Monday morning standup.
Anthropic's tool is the first attempt to drag that conversation out of the lab and into the org chart. It asks: which tasks in real jobs—not hypothetical prompts—can AI perform today?
That's a harder question. And it produces harder answers.
But here's what most coverage is missing: task-level analysis is still not job-level analysis.
An AI can automate your expense reconciliation. It can draft your weekly status update. It can summarize the competitive intelligence report. Tick, tick, tick—your job is 60% automated, according to the detector.
What it can't account for is what holds those tasks together: the judgment calls, the context that lives in your head, the relationships built over years. Automate the tasks, break the connective tissue—and you haven't optimized the role. You've destabilized it.
Why One Model Gives You the Wrong Answer Here
This is exactly where single-AI thinking fails.
Ask one model "how exposed is our marketing team?" and you'll get a confident, coherent answer. The problem is that confidence has no memory of your last restructuring, no skin in your legal exposure, and no clue what your VP of Sales actually does between the lines of their job description.
High-stakes labor decisions aren't prompt-and-response problems. They require deliberation.
This is what Shingikai's Traditional Council and Survivor strategies were built for.
In a Traditional Council run, multiple AI models deliberate your question in parallel—each weighing risk, utility, and organizational impact from a different angle. They don't produce a single consensus. They surface the tensions, the tradeoffs, the places where reasonable perspectives collide. That friction is the point. It's where the real signal lives.
In a Survivor run, those perspectives are stress-tested further: each position is challenged, weakened, eliminated until only the strongest reasoning survives. Not the most confident answer—the most defensible one.
For a question like "which roles in our org are actually at risk, and how should we think about retraining vs. restructuring?"—that's not a question you want answered by the first AI that sounds sure of itself.
What to Actually Do with the Detector
Anthropic's tool is useful. Use it—but use it as a starting point, not a verdict.
Here's a practical frame:
Map task exposure, not job exposure.
Pull the specific tasks flagged as high-risk. Don't ask "is this job automatable?" Ask "which of these tasks, if automated, would break something important downstream?"Stress-test the analysis.
Don't let one model (or one executive) interpret the results. The interpretation is where bias enters. Run a council deliberation on the findings. Where do models disagree? That's where your actual decisions live.Plan before the panic sets in.
The organizations caught flat-footed won't be the ones that didn't see AI coming—they'll be the ones that saw it but let one person (or one model) make the call about what it meant.
The Job Destruction Detector is a reality check. It's a good one. But reality checks don't come with action plans.
When you're ready to turn the reality check into a plan—don't ask one model what to do. Make them debate it.
That's the whole idea.