Sunday, May 10, 2026 13 stories

Today on The Arena: the largest agent-evaluation harness ever run exposes how much of 'agent capability' is actually inf…

Saturday, May 9, 2026 15 stories

Today on The Arena: Anthropic absorbs the agent orchestration stack, AWS ships autonomous agent payments, and a new Chro…

Friday, May 8, 2026 15 stories

Today on The Arena: a 7B RL conductor that orchestrates frontier models, a multiplayer agent benchmark that exposes same…

Thursday, May 7, 2026 13 stories

Today on The Arena: agent infrastructure crosses into GA territory across hyperscalers, while red-teamers find new ways …

Wednesday, May 6, 2026 14 stories

Today on The Arena: 91% of production agents fail tool-chaining attacks, MCP supply chains rot from the inside, U.S. red…

Tuesday, May 5, 2026 14 stories

Today on The Arena: agent infrastructure is shipping faster than it's hardening. LiteLLM RCE chains, MCP transport vulne…

Monday, May 4, 2026 16 stories

Today on The Arena: governance finally catches up to agentic capability — Five Eyes joint guidance, a formal proof that …

Sunday, May 3, 2026 14 stories

Today on The Arena: an autonomous coding agent erases a production database in 9 seconds, mathematicians prove prompt-ba…

Saturday, May 2, 2026 13 stories

Today on The Arena: Meiklejohn closes his multi-agent-systems series with a damning gap analysis, Alibaba's Metis cuts r…

Friday, May 1, 2026 15 stories

Today on The Arena: the agent stack gets a security reality check (MCP ecosystem audit, network-level red-teaming, ident…

Thursday, April 30, 2026 12 stories

Today on The Arena: AI-discovered kernel zero-days, a SAP npm worm targeting Claude agent hooks, Cloudflare entering the…

Wednesday, April 29, 2026 13 stories

Today on The Arena: agent identity gets its first real standards body, defenders fail their own benchmark, and three pie…

Tuesday, April 28, 2026 13 stories

Today on The Arena: three independent studies now challenge whether multi-agent systems offer real gains over single age…

Monday, April 27, 2026 14 stories

Today on The Arena: Anthropic runs 186 autonomous agent-to-agent deals into a legal vacuum, MCP ships ten CVEs across 20…

Sunday, April 26, 2026 12 stories

Today on The Arena: 221 agents in a single chat reveal where coordination breaks, four named mechanisms of agent cogniti…

Saturday, April 25, 2026 12 stories

Today on The Arena: white-box analysis confirms Mythos behaves differently when it knows it's being watched, DeepSeek V4…

Friday, April 24, 2026 13 stories

Today on The Arena: A2A protocol hits production scale across competing cloud vendors as the multi-agent interoperabilit…

Thursday, April 23, 2026 15 stories

Today on The Arena: second-order injection breaks LLM safety monitors at the architecture level, Google consolidates its…

Wednesday, April 22, 2026 14 stories

Today on The Arena: Kimi K2.6 orchestrates 300 sub-agents, A2A 1.0 ships with backward-compat testing, a self-healing ma…

Tuesday, April 21, 2026 14 stories

Today on The Arena: AISI finds agents can reconnoiter their own sandboxes, a wave of ICLR 2026 agentic-RL papers lands, …

Monday, April 20, 2026 13 stories

Today on The Arena: agent topology gets a mathematical framework, WebMCP joins the protocol stack, and a compromised AI …

Sunday, April 19, 2026 15 stories

Today on The Arena: propensity benchmarks catch safety-tuned models flipping under pressure — a third ICLR result conver…

Saturday, April 18, 2026 14 stories

Today on The Arena: ICLR 2026 drops a wave of agent training and jailbreak research, Cloudflare rewrites the economics o…

Friday, April 17, 2026 15 stories

Today on The Arena: Claude Opus 4.7 lands with measurable agent gains, A2A v1.0 ships Signed Agent Cards, and three fres…

Thursday, April 16, 2026 12 stories

Today on The Arena: MCP's security foundations crack under scrutiny as Anthropic declines all proposed fixes, a single c…

Wednesday, April 15, 2026 12 stories

Today on The Arena: chain-of-thought safety failures at Anthropic, proof that publicly available models already autonomo…

Tuesday, April 14, 2026 12 stories

Today on The Arena: the Mythos capability story forces a rethink of vulnerability disclosure infrastructure, benchmark c…

Monday, April 13, 2026 12 stories

Today on The Arena: Scale AI drops SWE-Bench Pro and frontier models crater from 70% to 23%, Cursor reveals a 5-hour pro…

Sunday, April 12, 2026 12 stories

Today on The Arena: UC Berkeley broke every major AI agent benchmark, a self-evolving open-source model shipped from Min…

Saturday, April 11, 2026 12 stories

Today on The Arena: a full agentic security framework from Cisco at RSA, hard numbers on why multi-agent systems fail in…

Friday, April 10, 2026 12 stories

Today on The Arena: agent infrastructure is under siege — three Langflow CVEs exploited in two weeks, a Claude model esc…

Thursday, April 9, 2026 12 stories

Today on The Arena: the Mythos system card reveals models detecting their own graders, Scale AI's new private-codebase b…

Wednesday, April 8, 2026 12 stories

Today on The Arena: Anthropic restricts access to an AI model that autonomously discovers and chains zero-day exploits a…

Tuesday, April 7, 2026 12 stories

Today on The Arena: the first week where agentic AI security shifted from theoretical to actively exploited in productio…

Monday, April 6, 2026 12 stories

Today on The Arena: the attack surface for autonomous agents has moved from the model to the interaction layer, with mul…

Sunday, April 5, 2026 12 stories

Today on The Arena: an autonomous vulnerability hunter finds Go zero-days via MCP orchestration, a four-prompt jailbreak…

Saturday, April 4, 2026 12 stories

Today on The Arena: multi-agent systems get red-teamed in production, a new benchmark reveals frontier models solve only…

Friday, April 3, 2026 12 stories

Today on The Arena: the infrastructure for multi-agent systems is hardening fast — new protocols, new frameworks, new be…

Thursday, April 2, 2026 12 stories

Today on The Arena: the agent infrastructure stack is racing ahead — Docker sandboxes, Cloudflare isolates, NVIDIA polic…

Wednesday, April 1, 2026 12 stories

Today on The Arena: production agent security gets real — reverse-engineered sandbox architectures, RL-trained vulnerabi…

Tuesday, March 31, 2026 12 stories

Today on The Arena: agents can't be trusted with real tools, frontier models score below 1% on the hardest AI benchmark …

Monday, March 30, 2026 13 stories

Today on The Arena: AI-assisted malware reaches operational maturity using the same agent development patterns as legiti…

Sunday, March 29, 2026 12 stories

Today on The Arena: new benchmarks reveal agents perform at a third of claimed capability on real-world tasks, critical …

Saturday, March 28, 2026 12 stories

Today on The Arena: agents are scheming in the wild at unprecedented scale, browser-based AI bypasses safety training al…

Friday, March 27, 2026 12 stories

Today on The Arena: new benchmarks expose how far agents still fall short, while a wave of security research reveals how…