πŸ”¨ The Anvil

Wednesday, May 13, 2026

15 stories · Standard format

Generated with AI from public sources. Verify before relying on for decisions.

🎧 Listen to this briefing or subscribe as a podcast →

Today on The Anvil: autonomy is outrunning oversight on every front β€” coding agents running by the thousand overnight, Iran unilaterally quadrupling its claimed maritime zone, and AI-generated zero-days hitting open source faster than CVEs can be filed. The infrastructure to govern any of it is arriving in pieces.

AI Developments

Microsoft MDASH: 100+ Specialized Agents Found 16 Windows Vulns Including 4 Critical RCEs on Patch Tuesday

Microsoft disclosed MDASH, a multi-model agentic vulnerability discovery system orchestrating 100+ specialized AI agents β€” frontier and distilled β€” across debater, prover, and ensemble-disagreement roles. The system found all 21 planted vulnerabilities in a private test driver with zero false positives, scored 88.45% on the CyberGym benchmark (industry top), and discovered 16 new vulnerabilities in the Windows networking stack including 4 Critical RCEs shipped in today's Patch Tuesday update. The architecture catches race-condition use-after-frees and cross-file alias-aliasing bugs that single-model approaches structurally miss.

Pair this with the GTIG zero-day story: offensive AI is industrializing, and defensive AI is now matching it with orchestration rather than bigger models. The durable advantage is in the validation pipeline β€” debater agents arguing with prover agents, ensemble disagreement as a credibility signal β€” not in whichever frontier model you wired up. That's a transferable architectural lesson well beyond security: anywhere you need autonomous output you can trust, the answer is multi-agent adversarial validation, not a single more-capable model.

Verified across 1 sources: Microsoft Security Blog

OpenAI, Microsoft, Broadcom, AMD, NVIDIA Ship Multipath Reliable Connection: Ethernet for 131K-Endpoint AI Clusters

A five-vendor consortium unveiled Multipath Reliable Connection (MRC), a new Ethernet protocol that swaps single high-bandwidth links for eight independent data-plane paths, doubling two-tier cluster scale from 65K to 131K endpoints while letting link failures self-heal without stopping training jobs. MRC is already deployed on Oracle Stargate and Microsoft Azure production clusters.

The capex-bottleneck story has been about chips, power, and gas turbines; MRC quietly addresses the third constraint nobody outside hyperscaler networking teams talks about β€” that a single link flap in a 100K-GPU cluster can torch a six-figure training run. Healing in-place changes the economics of long training jobs and removes a real ceiling on cluster size. Worth watching whether the spec becomes an open standard or stays a hyperscaler club good.

Verified across 1 sources: The Next Platform

AI Coding & Design Tools

Anthropic's Claude Code Lead Runs Thousands of Agents Overnight From His Phone β€” and Ships the Tooling to Match

Claude Code lead engineer Boris Cherny disclosed he runs 'a few thousand' Claude agents overnight, dispatched and monitored from the Claude mobile app, using /loops (local cron-driven scheduling) and Routines (server-side recurring tasks). Same week, Anthropic shipped Agent View (a real-time dashboard for multi-session oversight), /goal for outcome-based autonomous execution, and system prompt compaction to fight context drift across long sessions.

This is the operational model the industry has been gesturing at finally being documented by someone running it at scale: agents as always-on background workers, reviewed selectively rather than steered turn-by-turn. The fact that the dashboard, scheduling primitives, and context compaction tools all shipped in the same window as Cherny's disclosure tells you Anthropic now treats overnight fleet-of-agents usage as a first-class workflow, not an edge case. For anyone building product workflows around AI agents, the governance question β€” how do you triage thousands of completed PRs in the morning β€” just became the actual job.

Verified across 2 sources: Business Insider · Geeky Gadgets

GitHub Copilot's June 1 Repricing Lands: New $100 Max Tier, 'Flex Allotments' Replace Predictable Inclusions

GitHub published the final June 1 billing structure: Pro ($10) gets a $5 flex allotment, Pro+ ($39) gets $31 flex, and a new $100/mo Max tier bundles $100 base plus $100 flex for $200 monthly usage. The structural novelty: 'flex' allocations are explicitly variable β€” GitHub reserves the right to adjust included usage as inference economics shift without changing subscription prices. This is the third Copilot pricing recalibration in under a year.

The flex-allotment mechanic is the new detail. Where prior coverage established that credits were replacing seats, this finalizes how the volatility is distributed: the vendor keeps the headline price fixed and adjusts the capacity floor instead. For teams budgeting AI coding spend, the per-seat number on the invoice no longer reliably maps to the productivity it buys β€” procurement conversations will need to demand capacity floors rather than seat counts. GitLab announced the same structural shift to credit-based pricing this week, so this is becoming the platform-layer standard, not a Copilot-specific quirk.

Verified across 1 sources: GitHub Blog

Six-Month Real-World Bake-Off: Claude Code, Codex, Cursor, Copilot Each Win a Different Job

Fusion Collective published six months of TDD-anchored testing across the four leading AI coding tools. Findings: Claude Code wins on multi-file planning and terminal workflows; Codex leads on autonomous refactoring; Cursor offers the best hybrid model access at the cost of UI clutter; Copilot wins enterprise procurement fit. Across all four, 43% of AI-generated changes required production debugging β€” a number that calibrates the 'review discipline as bottleneck' thesis.

This is the first long-horizon comparison that isn't a vendor benchmark or a one-week vibe check, and the 43% rework rate is the headline that should anchor adoption conversations. The right read is not 'which tool wins' but 'which tool's failure mode matches your codebase' β€” and that the QA capacity to absorb a near-majority rework rate is the actual gating factor, which aligns with last week's Forbes Tech Council warning about QA atrophy under AI-generated code volume.

Verified across 1 sources: Built In

AI Supply Chain & Logistics

Amazon Pushes Autonomous Supply-Chain Agents Into Its Own Logistics Stack β€” Distributors Squeezed

AWS launched an AI-driven supply chain planning platform that wires autonomous agents into procurement, inventory, forecasting, and logistics β€” and crucially routes execution through Amazon Supply Chain Services' owned freight, warehousing, and delivery infrastructure. The agents monitor operations, detect disruptions, and execute decisions with minimal human involvement.

The strategic move here isn't 'Amazon adds AI agents' β€” it's that Amazon is now bundling autonomous procurement directly with its physical logistics network, creating a vertically integrated stack that traditional distributors can't easily match. This squeezes the 'transactional fulfillment + relationship sales' middle of distribution, the same way Amazon's retail bundling squeezed independent retailers a decade ago. Pair with this week's Project44 Autopilot and Transfix shipment-troubleshooting launches β€” execution-layer agentic AI is suddenly everywhere in logistics, and standalone visibility tools are losing their seat at the table.

Verified across 3 sources: Distribution Strategy Group · Business Wire (Transfix) · Kaleris

Cambridge Researcher: 40% of Supply-Chain Agentic AI Projects Will Be Killed Before 2027 β€” Cause Is Master Data, Not Models

A Cambridge researcher predicts >40% of agentic AI projects in supply chain will be cancelled before 2027, with the root cause being classification debt and master data quality β€” not model deficiency. Four 2026 calls: classification debt sinks more projects than model quality, supplier records get rebuilt as intelligence infrastructure, data residency becomes a sourcing criterion, and agent security escalates to a boardroom concern. Lands alongside The Loadstar's finding that 61% of logistics firms still run on email and spreadsheets despite 72% planning document-automation spend.

This is the same diagnosis the GEP/Darden survey delivered last week from a different angle: the failure mode isn't the AI, it's the data substrate it has to stand on. For builders, the actionable read is that supplier-record schemas and document-classification pipelines are now strategic infrastructure β€” not back-office plumbing β€” and that any agentic system shipped onto unstructured BOL/PO data is being set up to fail visibly enough to kill the program.

Verified across 2 sources: Forbes · The Loadstar

Design Engineering

Figma Publishes Its Design-to-Code Loop Thesis: Bidirectional Sync, Not Handoff

Figma published its formal position on AI-collapsed design/engineering workflows: designers pull live production code into Figma as editable frames, edit semantically, and push changes back without maintaining sync manually. The piece explicitly frames the design-β†’-export-β†’-handoff workflow as obsolete and positions AI translation between mediums as the new unit of work.

This is Figma responding directly to the Dessn/shadcn/ui thesis that ran in last week's briefing β€” that production codebases, not Figma files, are becoming the source of truth, and that design tools that don't run inside the code lose the workflow. Figma's answer is to claim bidirectional semantic sync as their territory. The interesting test is whether 'designers editing live code as frames' is a real workflow shift or a defensive narrative; the next six months of Figma Make, Dessn, and v0 adoption will settle it.

Verified across 1 sources: Figma

Ford + Sharrow Compress Propeller Lead Time 130 Days β†’ 2 Weeks via 3D-Printed Sand Casting

Ford's Advanced Industrial Technology & Platforms team partnered with Sharrow Engineering to take production lead time on Sharrow's loop-blade propellers from 130 days to roughly two weeks using Ford's 3D-printed sand-casting process β€” a workflow Ford has been refining for two decades on internal engine castings. The collaboration is now scaling for marine plus emerging drone and renewable-energy applications.

This is the production-scale validation that's been missing from the additive-manufacturing pitch for years: not 'we can prototype this' but 'we replaced a 4-month casting tooling pipeline with a 2-week additive one and shipped real product.' The relevant pattern for anyone working physical-product timelines is that the 3D printing isn't replacing the final part β€” it's replacing the casting mold, which is where the slow money lived. That's a much more leverageable insertion point than printing the part itself.

Verified across 1 sources: Voxel Matters

PTC Onshape + Altium Ship Direct ECAD–MCAD Sync in the Cloud

PTC Onshape and Altium shipped a direct connector that brings PCB designs into Onshape with real-time automatic synchronization, eliminating the file-export-and-pray workflow that has plagued mechanical-electrical coordination for decades. The integration is cloud-native and aimed at hardware teams shipping complex assemblies where board fit and enclosure tolerances actually have to agree.

For anyone building physical product with non-trivial electronics, the ECAD/MCAD sync gap is the single most expensive recurring coordination tax β€” board changes that nobody propagates to the enclosure, mechanical revisions that break connector clearances, version-mismatch fires found at assembly. Real-time sync in a shared cloud workspace doesn't eliminate the discipline, but it kills the file-version class of failure. The bigger thread is that cloud-native CAD is finally arriving at the workflow integrations that on-prem tools never quite shipped.

Verified across 1 sources: PRNewswire / PTC Inc.

Iran Conflict

Iran Quadruples Its Claimed Hormuz Zone to 200–300 Miles; Classified U.S. Intel Says 70% of Prewar Missile Stock Intact

IRGC Navy deputy political chief Mohammad Akbarzadeh announced May 12 that Iran's claimed Hormuz operational zone now stretches 200–300 miles β€” from Jask east to Siri Island west β€” versus the prior 20–30 mile band. Kuwait disclosed it foiled an IRGC infiltration attempt on Bubiyan Island on May 1. Separately, NYT-sourced classified U.S. intel says Iran has restored 30 of 33 missile sites along Hormuz and retains ~70% of prewar missile stockpiles plus ~90% of underground storage and launch facilities β€” directly contradicting administration claims of Iranian military decimation. Pentagon is reportedly considering renaming the conflict 'Operation Sledgehammer' to restart the 60-day War Powers clock if the ceasefire collapses. Trump landed in Beijing for a Trump–Xi meeting with Iran on the agenda. Bunker fuel in Singapore has surged from ~$500/ton to >$800/ton; Brent at $107.

Three threads braid here in ways the prior week's coverage treated separately. The 'managed access doctrine' framing from last week's geospatial analysis β€” Iran achieving ~95% Hormuz traffic collapse via GNSS spoofing and AIS suppression rather than conventional blockade β€” is now being codified in a formal unilateral zone claim that puts most Gulf commercial shipping inside Iranian declared jurisdiction. The classified intel picture of ~70% prewar missile stocks intact is a direct contradiction of the public administration narrative, and the War Powers clock reset maneuver signals the Pentagon is already war-gaming renewed kinetics. Watch the Trump–Xi readout for any signal on Chinese pressure over IRGC oil routing and Chang Guang Satellite Technology sanctions.

Verified across 8 sources: Institute for the Study of War · Gulf News · Philadelphia Inquirer / NYT · NBC News · The War Zone · Al Jazeera · AP News · U.S. Treasury

Spokane & North Idaho

Spokane Lights Up Eastern Washington's First Community Microgrid at the MLK Center

The Martin Luther King Jr. Family Outreach Center in East Central Spokane brought a $2M solar + battery + natural gas microgrid online β€” the first community-based microgrid in the Inland Northwest, engineered by Avista and funded through Avista's Named Communities Investment Fund plus a Washington Department of Commerce grant. The system can power up to 400 people during extended outages and reduces baseline operating cost. Same week: Spokane City Council passed a Public Spaces Activation Program (streamlined permitting for parklets, sidewalk cafΓ©s, festivals), County Commissioners enacted a two-week cap on private-property camping, and a regional Housing-First-to-Treatment-First MOU collapsed without sign-on from Spokane City, Spokane Valley, or the County β€” putting $6.3M in federal homelessness funding at risk.

The microgrid is the locally relevant infrastructure pattern worth pulling out: behind-the-meter generation plus storage plus gas backup, deployed at community-center scale, in a historically underserved neighborhood. It's the same architecture as the CalEthos/TerraVolt Southeast Idaho data-center campus from earlier this week, just three orders of magnitude smaller β€” and the policy framework (Avista's Named Communities fund) is interesting in its own right as a model for tying utility capex to equity outcomes. The homelessness-MOU collapse is the larger political story underneath.

Verified across 6 sources: Spokesman-Review · Globe Newswire / Avista · City of Spokane · Spokesman-Review (camping ordinance) · The Center Square · Spokesman-Review (viaduct parking)

Newport Beach

Newport Beach: Irvine Co. Trades a 842-Space Garage for 184 Units; Two New Challengers File Against Council Over Police Station and Surf Park

Irvine Company broke ground on an administratively-approved 184-unit, five-story residential building at 800 San Clemente Drive in Newport Center, replacing an 842-space parking garage and bringing Villas Fashion Island to 708 total units (completion 2028). The project moved through admin approval without Planning Commission or Council hearings β€” part of the streamlined infill push toward the state-mandated 4,845-unit RHNA target by 2029. Separately: Harvard Law grad Walter Stahr and retired physician Dr. Andy Gerken filed to run for City Council in response to community frustration over the Civic Center Park police station siting and a contested surf park development; a Blom-Curry debate is set for May 13. Across OC: Voice of OC reports only 17% of 25,000 housing units built 2021–2024 were affordable, with eight OC cities (including Aliso Viejo and Huntington Beach) building zero affordable units.

The Newport Center conversion is the local instantiation of a pattern playing out across coastal cities: state housing mandates are quietly forcing administrative approval pipelines that bypass the public-hearing veto points where infill projects historically died. Two well-credentialed challengers entering the Council race over land-use decisions is the predictable political backlash. Worth tracking as Huntington Beach's $50K/month mandate fines come down before May 15.

Verified across 4 sources: The Real Deal LA · Hoodline · Newport Beach Indy · Voice of OC

OSINT & Intelligence

Europol OSINT Hackathon Targets 19,500 Abducted Ukrainian Children; FAS Documents Satellite-Imagery Verification of AI Data Centers

Politico published details on a mid-April Europol two-day OSINT hackathon where teams from 18 countries β€” alongside ICC representatives β€” used photo analysis, social-media profiling, and metadata forensics to identify Ukrainian children believed abducted by Russia (Kyiv estimates 19,500+ systematically taken). Separately, the Federation of American Scientists published a methodology paper this week on using commercial electro-optical satellite imagery to independently verify hyperscale AI data-center buildouts, with case studies on Khazna's Ajman facility and xAI's Colossus showing measurable gaps between announced timelines and on-the-ground construction.

Two OSINT stories that on their own would be footnotes, but together mark a maturation point: institutional OSINT (Europol, ICC, FAS) is now routine, methodologically documented, and applied to both humanitarian accountability and infrastructure verification. The FAS satellite-imagery framework is particularly useful as a public-domain methodology for verifying corporate AI capex claims β€” pair it with this week's IRGC sanctions targeting Chang Guang Satellite Technology and you get the full picture of geospatial intelligence as a contested commons.

Verified across 3 sources: Politico · Federation of American Scientists · Follow the Money

Cross-Cutting

Google GTIG: First In-the-Wild AI-Generated Zero-Day Catches Industrial-Scale State Actor Activity

Follow-on reporting on Google GTIG's first in-the-wild AI-generated zero-day β€” a 2FA bypass on a widely deployed open-source sysadmin tool β€” now adds operational tempo data: PRC and DPRK actors using persona-based prompting and the Gemini API to run autonomous Android malware (PROMPTSPY), Russia-aligned groups using LLMs for polymorphic obfuscation, and attackers explicitly targeting open-source AI tooling. Sysdig separately clocked CVE-2026-44338 (PraisonAI auth bypass) exploited within 3 hours 44 minutes of disclosure by a scanner identifying itself as CVE-Detector/1.0.

Yesterday's briefing covered the GTIG disclosure and the DPRK/PRC LLM-for-offense framework. What's new today is the advisory-to-exploit latency data: the PraisonAI CVE went from disclosure to active exploitation in under four hours, and the targets cluster around open-source AI tooling β€” PraisonAI, Marimo, LMDeploy, Langflow β€” suggesting adversaries have correctly identified that the agentic-AI ecosystem ships insecure defaults. If you're shipping anything MCP-adjacent, your patch-to-exploit assumption needs to be single-digit hours, not days.

Verified across 3 sources: The Next Web · Open Source For U · Sysdig


The Big Picture

Autonomy is now ambient, not interactive Anthropic's own Claude Code lead is running thousands of agents overnight; Cursor's Background Agent is GA; SAP's autonomous warehouse robots are in live production. The interaction model has shifted from 'prompt and wait' to 'dispatch and review' β€” and the tooling (Agent View, /goal, system prompt compaction) is just now catching up to make the operational model legible.

Governance is the new product surface Axiomstudio's VibeFlow, Red Hat's developer tools, GitHub Spec Kit, and the Maersk/MIT forum all converge on the same point: in 2026 the question isn't whether to use agentic coding, it's how to audit, gate, and account for it. Compliance-aware agentic infrastructure is moving from differentiator to table stakes.

Iran's Hormuz doctrine keeps expanding by decree From May 12: the IRGC unilaterally redefined its operational zone from 20–30 miles to 200–300 miles, attempted a Kuwaiti island infiltration, and classified U.S. intel now says Iran retains ~70% of prewar missile stocks. The 'managed access' framing from last week's geospatial analysis is hardening into a permanent claim of maritime sovereignty.

AI-generated exploits are now operating at advisory-to-exploit latency of hours Google GTIG's first confirmed in-the-wild AI zero-day, Sysdig clocking CVE-2026-44338 exploited in 3h44m, Microsoft's MDASH finding 16 Windows vulns, and Cisco open-sourcing Foundry Security Spec all landed in the same week. The defensive baseline is being rewritten in real time.

Local infrastructure stories quietly mirror the national AI capex story Spokane's MLK microgrid (solar + gas + battery, first in Eastern WA) and Newport Beach's Irvine Company parking-to-housing conversion are both about taking underused physical infrastructure and re-densifying it. The pattern at the data-center scale β€” behind-the-meter generation, infill density β€” is showing up at the community-center scale too.

What to Expect

2026-05-13 Trump–Xi meeting in Beijing; Iran on the agenda alongside trade. Watch for any signal on Chinese pressure over IRGC oil flows.
2026-05-13 Newport Beach councilman Blom vs. former mayor Curry debate on police station siting in Civic Center Park.
2026-05-15 Huntington Beach housing-mandate ruling expected; up to $50K/month retroactive penalties on the table.
2026-05-19 Kootenai County GOP precinct election β€” 141 candidates across three factions for 74 committeeman seats; structural fight for North Idaho endorsements.
2026-06-01 GitHub Copilot transitions to credit-based billing with new Pro flex allotments and a $100/mo Max tier.

Every story, researched.

Every story verified across multiple sources before publication.

🔍

Scanned

Across multiple search engines and news databases

979
📖

Read in full

Every article opened, read, and evaluated

160

Published today

Ranked by importance and verified across sources

15

β€” The Anvil

πŸŽ™ Listen as a podcast

Subscribe in your favorite podcast app to get each new briefing delivered automatically as audio.

Apple Podcasts
Library tab β†’ β€’β€’β€’ menu β†’ Follow a Show by URL β†’ paste
Overcast
+ button β†’ Add URL β†’ paste
Pocket Casts
Search bar β†’ paste URL
Castro, AntennaPod, Podcast Addict, Castbox, Podverse, Fountain
Look for Add by URL or paste into search

Spotify isn’t supported yet β€” it only lists shows from its own directory. Let us know if you need it there.