Today on The Anvil: a fragile US-Iran deal framework emerges from 105 days of strikes and counter-strikes, AI coding tools complete their shift to metered billing, and the question of who controls the AI agent layer gets answered — expensively — by engineers at Google, Microsoft, and Xiaomi.
After 105 days of conflict—including the recent strike exchange and Iran's closure of the Strait of Hormuz we've been tracking—a detailed 14-point memorandum of understanding between the US and Iran has surfaced. It covers reopening the Strait without transit tolls, a 60-day ceasefire extension including Lebanon, and a commitment by Iran not to seek nuclear weapons, with the uranium stockpile resolution deferred to a second phase. Trump claims the deal is agreed 'at the highest level', while Iran's Foreign Ministry calls it 'mere speculation.' Trump canceled a planned Kharg Island attack, citing imminent diplomatic progress.
Why it matters
This directly addresses the major sticking points we've followed: the Strait closure (now poised to reopen toll-free) and Iran's nuclear stockpile (which Trump previously demanded they permanently renounce, now deferred). The explicit exclusion of Iran's missile program and Hezbollah support means any signed agreement leaves major regional security issues for later. Khamenei's formal approval remains the ultimate, unpredictable gatekeeper.
Expanding on the overnight agent fleet workflows we covered previously, Anthropic's Claude Code lead Boris Cherny revealed at Fortune Brainstorm Tech that he hasn't written a line of code by hand in eight months. He framed the progression as 'bottleneck migration'—once code-writing was automated, code review became the constraint—and published a five-rule operational framework for running 1,000+ autonomous agents, covering auto-permission modes, /goal directives, cloud-based Routines, and end-to-end self-verification.
Why it matters
We already knew Cherny was running thousands of agents overnight; the new value here is his formal framework and the 'bottleneck migration' concept. The five rules make Anthropic's internal practices actionable for teams already running Claude Code. The Gutenberg-moment framing raises harder questions about team structures and code ownership as the barrier to authorship collapses.
Microsoft Research released SkillOpt under MIT license, a framework that optimizes agent skill documents — stored as markdown files — through an iterative propose-and-test loop with learning rates and validation gates. Across 52 benchmark combinations using GPT-5.5, SkillOpt delivered a +23.5 point average improvement over no-skill baselines, with largest gains in document extraction, AP automation, and claims processing. Skills trained in one harness (Claude Code, Codex CLI, plain chat) transfer to others. Final skills median around 920 tokens, training cost $1–5 per task on Claude Sonnet. The framework is harness-agnostic and does not require model weight changes.
Why it matters
SkillOpt reframes a persistent pain point in agentic development: prompt engineering is unstable, brittle, and resists systematic improvement. By treating skill documents as trainable objects with mathematical discipline — learning rates, validation gates, regression testing — rather than trial-and-error text, Microsoft provides a framework for systematic skill evolution that integrates with any agent harness. The $1–5 training cost per task and portability across Claude Code, Codex, and chat make this immediately practical. For product builders designing multi-step agent workflows, the combination of SkillOpt and something like Cherny's operational framework represents a full stack for reliable agentic execution — from skill design through production orchestration.
Cohere released North Mini Code, a 30B-parameter mixture-of-experts coding model with 3B active parameters per token, optimized for agentic software engineering. The model runs on a single H100 at FP8, features a 256K context window, reports 33.4 on the Artificial Analysis Coding Index, and delivers up to 2.8x higher output throughput than comparable models. It ships under Apache 2.0 and supports interleaved thinking, native tool use, sub-agent orchestration, and systems architecture mapping. Available via Hugging Face, Cohere API, and OpenRouter.
Why it matters
North Mini Code targets the 'sovereign AI' objective: a genuinely capable coding agent that enterprise teams can self-host without a multi-GPU cluster. The sparse MoE architecture (8 active out of 128 experts) is the key efficiency lever — it delivers coding capability competitive with larger dense models at a fraction of the inference cost and hardware requirement. Arriving the same week that GitHub Copilot, Cursor, and Claude Code all shifted to token-metered billing, a self-hostable Apache 2.0 alternative with flat infrastructure costs is a meaningful competitive differentiator for teams managing AI spend. The Apache 2.0 license also removes the legal friction that enterprise procurement teams face with more restrictive model licenses.
The AI coding tool pricing transition we saw begin with GitHub Copilot's shift to token credits is now complete across the industry. Cursor, Windsurf, and Anthropic have all followed suit with metered billing. Users report 10x–100x cost variance on agentic workloads, confirming the massive budget overruns that led Uber to cap its AI spend earlier this month. Claude Fable 5 access moves to API-only at $10/$50 per million tokens. Meanwhile, Microsoft is canceling internal Claude Code licenses at the June 30 fiscal year end to establish cost governance before its next AI rollout.
Why it matters
The billing shocks we noted with Copilot's credit rollout are now an industry-wide reality. The removal of free-model fallbacks means teams running agentic workflows hit hard walls rather than degraded fallbacks. Microsoft's internal pivot illustrates the enterprise response pattern: retrench to governed tooling, audit usage, then selectively re-enable. Teams need to audit model selection defaults and configure spending caps immediately.
OpenAI announced the acquisition of Ona (formerly Gitpod GmbH), a platform enabling AI agents to run in persistent cloud-based sandboxes that remain active when developer workstations shut down. The platform includes program-hashing obfuscation detection, file-system access controls, credential isolation, and outbound connection blocking — addressing both the interruption problem and enterprise governance requirements for long-running agents. The integration targets Codex, which has 5M+ weekly users.
Why it matters
Local agent execution has a fundamental architectural flaw: the workstation has to stay on. For multi-day agentic tasks — the kind Cherny describes running overnight — local execution is a non-starter. Ona's cloud-native sandboxes solve persistence at the infrastructure layer while the security architecture (hashing, credential isolation, connection controls) addresses the enterprise governance gap that has kept agentic workflows out of regulated environments. At 5M+ Codex weekly users, the distribution surface for persistent agents just expanded substantially. This is a horizontal infrastructure acquisition, not a capability play — OpenAI is buying the plumbing that makes long-running agentic tasks production-ready.
Xiaomi released MiMo Code V0.1.0 on Thursday as a terminal-native agentic coding assistant claiming 62% accuracy on SWE-Bench Pro versus Claude Code's 57%, with persistent cross-session memory via SQLite and a checkpoint-writer subagent architecture maintaining context across 200+ consecutive steps. The tool ships under MIT license with multi-provider model support (including bring-your-own-model), priced at $0.40–$3.00/M tokens for Xiaomi's own models. A significant caveat: default telemetry routes data to Xiaomi tracking servers, creating a trust gap for regulated industry use.
Why it matters
MiMo Code makes two substantive claims worth evaluating separately. First, the harness architecture — SQLite-backed persistent memory and checkpoint subagents — is a legitimate engineering approach to the context-window amnesia problem that plagues long agentic sessions. Second, the benchmark claim (62% vs 57% on SWE-Bench Pro) is from internal testing and requires independent verification. The MIT license and model-agnostic design are genuine differentiators as proprietary tools shift to metered billing. The telemetry issue is a real disqualifier for enterprise and regulated contexts without explicit opt-out configuration. The interesting technical question is whether the memory architecture or the base model accounts for the benchmark delta — if it's architecture, the harness pattern is portable.
GitHub released Agentic Workflows to public preview, enabling developers to describe CI/CD tasks in natural language Markdown instead of YAML. A compilation model converts those descriptions into deterministic, reviewed lockfiles executed by GitHub Actions, with a layered security architecture — permissions scoping, audit logs, human review gates — designed for unattended production use.
Why it matters
The architecture here is the story: natural language input, deterministic execution, reviewed intermediate representation. This is how to give an AI agent write access to production systems without sacrificing auditability — the compilation model creates a human-reviewable artifact (the lockfile) between intent and execution. That pattern — agent generates plan, human or system reviews plan, deterministic executor runs plan — is the design template for safe agentic automation in any production context. The public preview timing, concurrent with GitHub Copilot's token billing shift, suggests GitHub is positioning Agentic Workflows as the higher-value tier that justifies metered pricing.
Average return rates across apparel and home goods reached 22.4% industry-wide as of June 2026, with per-unit processing costs rising to $14.80. In response, 67% of brands now charge partial or full return shipping fees — up from 41% in early 2024. 3PLs are deploying AI-driven tools including Loop Warehouse Intelligence and Happy Returns' expanded physical drop-off network. A VUB study of nearly 10,000 European shoppers found that 15% of consumers — 'serial returners' — account for almost 60% of e-commerce return carbon emissions, generating ~20kg CO₂ annually versus ~3kg for other consumers. BNPL adoption correlates with 31% higher return rates; TikTok Shop impulse purchases are a documented accelerant.
Why it matters
The VUB serial-returner finding is the most actionable new data point: if 15% of customers drive 60% of return emissions and a disproportionate share of processing costs, targeted intervention on that segment — better sizing tools, friction in the return flow, personalized guidance — has higher ROI than broad return policy changes. The simultaneous shift to return shipping fees signals that the 'free returns as conversion driver' era is ending at the P&L level. For operators building reverse logistics infrastructure, the combination of rising costs, increasing automation investment, and shifting consumer policy creates an inflection point where the technology and economics of recommerce become compelling rather than aspirational.
The body of 5-year-old Amada Mia Brown, swept into the ocean at Laguna Beach on Tuesday during the largest south swell in nearly two decades, was recovered Thursday morning after 30+ hours and 90+ square miles of search operations. Her mother and brother were rescued by bystanders; both have been discharged from the hospital. Newport Beach lifeguards made over 140 rescues and thousands of preventive interventions across two days, with waves reaching 20–25 feet at the Wedge. The National Weather Service had issued a beach hazard statement through Thursday.
Why it matters
The scale of the rescue operation — 140+ active rescues, thousands of preventive contacts, multi-agency search across 90 square miles — reveals the operational stress exceptional swell events place on coastal emergency infrastructure. Newport Beach's lifeguard capacity was effectively maxed during peak conditions. The tragedy also drew national social media attention, with the father publicly addressing online blame directed at the mother, illustrating how local coastal incidents now carry significant public communication dimensions for city and county officials. The swell conditions have since moderated, but the event is prompting review of public warning signage, beach closure protocols, and lifeguard resource allocation during extreme ocean events.
Democrats moved ahead in two competitive Orange County supervisor races after mail-in ballot processing reversed early Republican leads. In the Fifth District, incumbent Katrina Foley holds a narrow lead over Republican Diane Dixon. In the Fourth District, Buena Park Mayor Connor Traut leads Tim Shaw. The races will determine whether Democrats maintain their current Board of Supervisors majority. Both contests may head to November runoffs given ballot-counting uncertainty.
Why it matters
Board composition directly shapes OC policy on development, budget priorities, and infrastructure — particularly relevant given the county's simultaneous $75M structural deficit, the Airport Fire liability settlement, and the hotel development cap proposal affecting Newport's North End redevelopment pipeline. Foley's narrow lead is notable because she's the supervisor who introduced public notice requirements for pesticide/herbicide use in flood control channels — the issue that came before the board this week. A Republican majority could reverse that direction. Mail-in ballot processing has now twice in recent OC cycles reversed election-night results, a pattern that will shape how campaigns and media cover future OC races.
A nine-month Spokane County criminal justice task force released a 100+ page road map Thursday proposing unified coordination across jails, mental health services, addiction treatment, and housing — but without specific capacity measurements, bed counts, funding amounts, or a determination on whether a new jail is needed. The report represents a deliberate pivot from the failed 2023 Measure 1 ($1.7B, 30-year tax hike rejected by voters). Political consensus was achieved by deferring all specifics to future work groups. A sales tax ballot measure could still emerge by November 2026.
Why it matters
This is a politically successful document that solves a political problem — building consensus after a devastating ballot failure — without resolving any of the operational questions. The task force avoided the specifics that killed Measure 1 (a price tag voters rejected) by not generating specifics. What it does is reset the coalition and create a framework for future ballot measures to attach to. The November 2026 sales tax window is the real deadline: if work groups can't produce fundable specifics by fall, the opportunity for voter-approved revenue this cycle closes. Context: Spokane County is already navigating a $30M deficit and a controversial no-vote 0.1% public safety sales tax that the commission approved last week.
Agentic infrastructure is the new battleground OpenAI acquiring Ona for persistent cloud sandboxes, Anthropic's Boris Cherny running 1,000+ agents, Microsoft open-sourcing SkillOpt, and Xiaomi releasing MiMo Code all point to the same shift: the competitive moat is no longer the base model but the orchestration harness, memory architecture, and operational infrastructure around it.
AI coding tools completed their cloud-billing transition GitHub Copilot, Cursor, Claude Code, and Windsurf all moved to token-metered billing within days of each other in early June. Microsoft canceling Claude Code licenses, Uber capping spend, and Google engineers posting 'slop' memes internally are early signals that enterprise AI coding governance — not model capability — is now the primary adoption constraint.
Diplomacy by threat: the Iran negotiation pattern Trump canceling three consecutive nights of strikes after threatening to hit Kharg Island, a 14-point MoU surfacing publicly while Iran's foreign ministry calls it speculation, and a potential Geneva signing with JD Vance — the Iran endgame is being negotiated through alternating escalation threats and walk-backs, a pattern that makes any 'deal announced' headline unreliable until Khamenei signs.
Circularity and returns are becoming cost-center crises Return rates at 22.4% industry-wide, per-unit processing costs at $14.80, 67% of brands now charging return shipping fees, and the VUB finding that 15% of shoppers drive 60% of return emissions — reverse logistics has moved from afterthought to P&L priority, accelerating both automation investment and policy shifts simultaneously.
Data center infrastructure is fracturing regional energy politics Spokane's emergency moratorium, Kootenai County's prior 182-day ban, Avista's undisclosed 500MW customer, and community opposition citing the Spokane River's documented dry-out form a coherent regional pattern: hyperscale AI infrastructure demand is outpacing grid and water planning in ways that are now producing legislative responses, not just petitions.
What to Expect
2026-06-13/14—Potential US-Iran MoU signing ceremony in Europe, reportedly with VP JD Vance representing the US — Iranian Supreme Leader Khamenei's final approval remains the blocking factor as of Friday.
2026-06-22—Claude Fable 5 access via claude.ai subscription plans ends; users must migrate to API-only access at $10/$50 per million tokens.
2026-06-30—Microsoft discontinues internal Claude Code licenses across Windows, M365, Outlook, Teams, and Surface engineering teams; engineers redirected to GitHub Copilot CLI.
2026-07-31—Idaho Panhandle Resource Advisory Committee deadline for $1.1M in Title II project proposals across five North Idaho counties.
2026-Q4—Formlabs Fuse X1 production-grade SLS printer expected to ship; Walmart targeting 16 AI-coordinated next-gen distribution centers operational by year-end.
How We Built This Briefing
Every story, researched.
Every story verified across multiple sources before publication.
🔍
Scanned
Across multiple search engines and news databases
1032
📖
Read in full
Every article opened, read, and evaluated
187
⭐
Published today
Ranked by importance and verified across sources
12
— The Anvil
🎙 Listen as a podcast
Subscribe in your favorite podcast app to get each new briefing delivered automatically as audio.
Apple Podcasts
Library tab → ••• menu → Follow a Show by URL → paste