Today on The Coordination Layer: agent-economy infrastructure keeps maturing β Circle ships a unified Agent Stack, an open MCP server pipes live Polymarket data into Claude, and AWS adds autonomous payments β while the Kelp/LayerZero exploit moves into its DAO-governance phase via a binding Arbitrum vote on $71M of disputed ETH. Microsoft drops a benchmark showing frontier agents still negotiate badly.
An independent developer published polymarket-mcp-pro, an MCP server backed by api.protodex.io (indexing 13,964 markets and $11.7B+ cumulative volume) that exposes seven read-only tools to Claude, Cursor, and other MCP clients: market listing by volume, historical price snapshots across 10.8M+ data points, crash detection, and order-book depth inspection.
Why it matters
Directly applicable plumbing for anyone wiring an LLM into prediction-market analysis. The relevant pattern is more important than the specific server: read-only, well-scoped MCP tools backed by an indexed historical store give agents the grounding they need without granting trading authority. Same pattern works for Uniswap volumes, lending rates, or liquidation thresholds β pair this with the Circle/x402 settlement stack and the read/write separation becomes architecturally clean.
Microsoft Research published SocialReasoning-Bench, evaluating GPT-4.1, GPT-5.4, Claude Sonnet 4.6, and Gemini 3 Flash on calendar coordination and marketplace-negotiation tasks acting on behalf of a user. Two new metrics β Outcome Optimality and Due Diligence β show frontier agents reliably complete tasks but leave substantial value on the table and fold under adversarial counterparties.
Why it matters
The benchmark separates 'did the agent finish' from 'did the agent represent the user well,' which is exactly the gap that matters when agents start holding USDC and negotiating with other agents. For prediction-market agents and DAO delegates, the failure mode is not refusal or hallucination β it is silent under-performance against a competent adversary. Worth tracking what the leaderboard looks like once labs start tuning for it.
Claude Code 2.1.139 ships Agent View (a single interface for managing multiple concurrent sessions with running/blocked/completed state tracking), a /goal command for persistent multi-turn task completion, improved MCP server reconnection, and CLAUDE_PROJECT_DIR environment-variable support, plus dozens of smaller fixes across login, plugin management, transcript navigation, and IDE integrations.
Why it matters
This is the in-process orchestration layer that was implied but not delivered by last week's Dreaming, Outcomes, and multi-agent orchestration rollout. Agent View directly addresses the tab-sprawl problem when running parallel Claude sessions; CLAUDE_PROJECT_DIR and MCP reconnection fixes target containerized and CI-driven deployments specifically. The Spiral team's 48-hour Claude Managed Agents deploy β Haiku as coordinator, Opus as specialist, rubric-graded parallel drafts β now has a native management surface.
For the week ending May 4, Kalshi posted $4.13B in notional volume against Polymarket's $1.60B β a 72.1% share for Kalshi and a 6.2% week-over-week decline for Polymarket, its lightest week since late March. Kalshi's transaction count exceeded Polymarket's by 1.54x, driven by sports markets and the Exotics parlay product. Polymarket V2's CLOB rebuild and pUSD launch on April 28 β covered earlier this week β have not reversed the share trend.
Why it matters
The volume story is now running counter to the infrastructure story: Polymarket shipped its most significant technical overhaul (rebuilt CLOB, EIP-712+HMAC auth, ghost-transaction enforcement) and lost share anyway. Kalshi's dominance is being driven by product and UX β sports, parlays β not oracle or settlement architecture. That's a relevant datapoint given the SEC's ongoing freeze of 24 prediction-market ETFs, which cited manipulation and settlement mechanics: the regulated, USD-denominated venue is winning retail flow regardless of what the ETF decision produces.
An arXiv paper proposes PIRAP, a perpetual-futures engine designed for binary prediction-market underlyings, evaluated against Polymarket high-frequency tick data from April 21β27. The design uses composite robust indexing, jump-aware margin scheduling, and multi-stage resolution-zone protocols to handle bounded support, discrete terminal collapse, and asymmetric tail liquidity β properties that break standard crypto-perp mechanics.
Why it matters
Concrete mechanism-design work on a problem that keeps getting hand-waved: leveraged derivatives on event markets behave nothing like leveraged perps on spot. The paper documents the empirical failure modes and specifies the minimum risk architecture to survive them. Useful prior art for anyone seriously considering perps on Polymarket or Kalshi outcomes.
Following the April KelpDAO/LayerZero exploit β where LayerZero acknowledged a single-validator DVN misconfiguration and Solv, Re, Huma, and Tydro fled to Chainlink CCIP β the downstream legal fallout is now triggering a Constitutional Arbitrum Improvement Proposal. Starting May 15, ArbDAO votes on transferring 30,765 ETH (~$71M) into Aave LLC custody to comply with a court order. The funds are simultaneously claimed by U.S. terrorism-judgment creditors holding $877M in unpaid awards who allege Lazarus links, and legal restraints follow the asset post-transfer regardless of how the vote resolves.
Why it matters
The Aave collateral overhaul announced at Consensus Miami last week β adding cybersecurity, interoperability, and systemic-interconnection criteria β now gets stress-tested against real disputed funds within days of that announcement. More structurally: this is the first binding DAO governance vote arising directly from a bridge exploit with active court restraints on the underlying assets. The mechanics of how ArbDAO threads court compliance through on-chain voting will be the reference case for anyone designing DAO coordination around contentious multi-party disputes.
A practitioner write-up categorizes the current agent-security blind spot into three risk classes: general-purpose coding agents, MCP-connected vendor agents with real system access, and custom agents built by non-programmers. Configuration decisions β tool scope, MCP server permissions, prompt-injection surface β are being made without security input and compounding into outsized blast radius.
Why it matters
This lands the day after Knostic's finding that 1,862 MCP servers were publicly exposed with zero authentication β 100% of a 119-instance sample allowing unauthenticated access to financial databases and CRMs β and alongside Microsoft's Semantic Kernel CVEs (prompt-injection-to-RCE and arbitrary file writes). The pattern across all three: the MCP ecosystem is shipping faster than its threat model. For DeFi agents and DAO automation, tool-poisoning, supply-chain MCP attacks, and over-broad scoping are now first-order risks. EU AI Act Article 12's tamper-evident logging requirements will start to bite anyone running production agents without isolated audit trails β and unlike the Annex III high-risk delays, Article 12 is on its original timeline.
Germany's BaFin announced targeted inspections of financial institutions to assess 'substantial' AI risks. Japan's Finance Minister stood up a public-private working group specifically to address cybersecurity risks from Anthropic's Mythos model in the financial system. Both announcements landed within 24 hours and shift the regulatory posture from categorical compliance documentation to operational risk assessment.
Why it matters
Notable for two reasons: regulators are now naming specific models in oversight workstreams (Mythos), and inspections are arriving before the EU AI Act's Annex III high-risk obligations are even enforceable (deferred to Dec 2027). For builders shipping AI into financial workflows, the operational-inspection regime is now the binding constraint, not the published framework.
The Commerce Department removed without explanation a May 5 announcement describing an agreement under which Microsoft, Google, and xAI would submit frontier models to government scientists for pre-release security testing. The page was redirected to the Center for AI Standards and Innovation. No comment from the White House or the three companies; Anthropic was notably absent from the original agreement.
Why it matters
Reads as an unresolved internal fight between intelligence-led and Commerce-led AI oversight (consistent with the Washington Post turf-battle reporting earlier this week). The absence of explanation matters more than the deletion: the procurement and pre-release-testing baseline that briefly looked stable is once again contested, which is exactly the kind of policy uncertainty that pushes large customers toward sovereign or on-prem deployments.
Aptos Labs announced a native Encrypted Mempool design that would hide transaction intent at the L1 protocol layer via batched threshold decryption using validator keys, with decryption occurring only after block ordering is finalized. Pending governance approval; framed as the first L1 to embed MEV-resistance at the protocol layer without additional trust assumptions.
Why it matters
If shipped as described, this collapses a stack of MEV-mitigation middleware (Flashbots-style relays, encrypted bundles, intent auctions) into base-layer guarantees. Prediction markets and any onchain auction mechanism inherit much cleaner execution semantics. The open question is whether validator-set threshold decryption holds up under collusion or regulatory pressure on validators β the trust model moves from searchers to validators, not away from trust entirely.
Icertis surveyed 1,000+ U.S. corporate legal professionals: 47% would not detect unauthorized or incorrect AI-agent actions until after they occurred, only 23% have documented agentic AI policies, and 39% claim real-time visibility. Just 26% are 'very confident' in AI accuracy for high-stakes decisions, while nearly 10% have already minimized human review on autonomous tasks.
Why it matters
Empirical data on the governance gap that's been showing up anecdotally in the AI-hallucination sanction wave. Same week, the UK Law Society publicly asked regulators for clear court-AI rules, the Chicago Bar Association launched 10 working committees, and Docusign/LexisNexis shipped agentic contract platforms. The deployment curve is well ahead of the oversight curve, and bar associations are the ones moving to close it.
An international team used carbonate-associated phosphate (CAP) analysis on samples from seven globally distributed sites, including Anticosti Island, to provide direct geochemical evidence linking sharp ocean phosphorus spikes to the Late Ordovician (~445 Ma, ~85% marine species loss) and Late Devonian (~372 Ma, ~80%) mass extinctions. The mechanism: algal blooms β anoxia β suffocation.
Why it matters
First direct measurement of a nutrient-driven extinction mechanism that has been theorized for decades. Two independent events, same chemical signature β promotes nutrient cycling from contributing factor to systemic driver alongside COβ and impact. Reframes how modern agricultural runoff dead zones are read against deep-time precedent.
Studios bypassing Cannes for controlled launches is now a stable pattern, not a one-year anomaly β the festival's identity is reverting to international auteur cinema by default. The AI-labeling-while-banking-Meta-money posture is the kind of contradiction the next few editions will have to resolve more explicitly. Worth watching alongside Schoenbrun's on-record account of every major studio passing on 'Camp Miasma' before Mubi picked it up.
Agent-economy plumbing converges on USDC + MCP Circle Agent Stack, AWS AgentCore Payments, x402-extract-mcp, and a Polymarket MCP server all ship within a few days, all assuming the same substrate: stablecoin settlement under MCP-exposed tools. The agent-as-customer framing is no longer aspirational.
Cross-chain exploits keep paying governance interest The April Kelp/LayerZero exploit is now a binding Arbitrum vote, a $3B Chainlink CCIP migration, and a CoW DAO treasury reshuffle. Bridge security failures cascade into months of downstream governance load.
Regulators move from frameworks to targeted inspections BaFin announces targeted AI inspections; Japan stands up a Mythos-specific working group; the US Commerce Department quietly deletes its pre-release testing announcement. Operational enforcement is starting before the policy debate finishes.
Legal-AI governance gap gets quantified Icertis survey finds 47% of in-house legal teams would not detect unauthorized AI agent actions until after they occurred; only 23% have documented agentic AI policies. Bar associations and UK solicitors are now publicly asking for rules rather than waiting for them.
Agent evaluation is catching up to agent deployment Microsoft's SocialReasoning-Bench shows frontier models complete negotiation tasks but leave value on the table and fall to adversarial counterparties β the kind of failure mode that matters when agents are spending USDC, not generating text.