πŸ”¨ The Anvil

Sunday, May 17, 2026

14 stories · Standard format

Generated with AI from public sources. Verify before relying on for decisions.

🎧 Listen to this briefing or subscribe as a podcast →

Today on The Anvil: the AI coding tool market consolidates into desktop apps the same week Microsoft pulls Claude Code from its own engineers, Figure's humanoid sorts 100K packages in a second consecutive livestream that roboticists are again publicly skeptical of, and Iran moves its Hormuz toll system from deniable practice to parliamentary policy. Plus a Spokane newspaper becomes a nonprofit and a public MCP server quietly opens California's criminal-justice records to any AI agent.

AI Coding & Design Tools

OpenAI and GitHub Both Ship Desktop Coding Apps the Same Week

OpenAI released the Codex desktop app for macOS and Windows β€” multi-agent workflows across terminal/IDE/web, cloud environments with worktrees, team-customizable Skills, Automations for background tasks, and PR review. The same week, GitHub put its Copilot desktop app into technical preview with unified inbox for issues/PRs/agent sessions, side-by-side diffs, and multi-agent orchestration built on the Copilot CLI. Pro/Pro+ get Copilot first; Business/Enterprise rollout this week.

Two of the three biggest coding-agent vendors shipped standalone desktop surfaces within days of each other, and the convergence is telling: the inline-suggestion-in-VS-Code era is over, and the new product shape is an orchestration cockpit that manages fleets of long-running agents across repos, PRs, and chat. For anyone evaluating tooling, the architectural decision is no longer 'which editor extension' but 'which control plane' β€” and the control planes are starting to look very similar.

Verified across 2 sources: OpenAI · The New Stack

Codex vs Claude Code: Benchmarks Now Trade Wins, Token Costs Diverge 3-4x

Morph's post-April benchmark comparison shows Claude leading SWE-bench Pro (64.3% vs 58.6%) and Codex leading Verified (88.7% vs 87.6%) β€” close enough that capability is no longer the deciding factor. The new number that matters: Claude burns 3–4x more tokens for equivalent work. Claude Code now accounts for ~10% of all public GitHub commits. The architectural split: Claude prioritizes coordinated multi-agent depth; Codex runs up to 8 parallel subagents optimized for isolated speed.

The 3–4x token cost gap is now the explanation for two events you've already tracked: Anthropic's April subscription cut and Microsoft's decision to cancel Claude Code licenses for its Experiences and Devices division by June 30. Benchmarks have converged; the next 12 months of enterprise procurement will be decided on cost-per-task and orchestration model β€” and the June 15 metering restructure (interactive subscription vs. SDK credits) means the billing architecture is shifting at exactly the moment the cost comparison is sharpest.

Verified across 1 sources: MorphLLM

Anthropic's CFO: Claude Code Now Generates 90% of Internal Software Work

Anthropic CFO Krishna Rao disclosed Claude Code now generates over 90% of the company's software engineering output, with finance reports compressed from hours to 30 minutes. Headcount expanded rather than contracted β€” employee focus shifted to oversight, architecture, and strategic decisions. This follows the 80x growth figure Cat Wu disclosed Friday (against an internal 10x plan), which itself explained the April subscription cut and the SpaceX compute deal. New alongside the CFO disclosure: a 'dreaming' feature lets agents analyze past performance and write notes to their future selves, paired with multi-agent orchestration and outcome grading in the same batch release as /goals.

The 90% figure lands one day after the 80x growth disclosure, and together they tell the same operational story from two angles: the product grew past Anthropic's own infrastructure, and Anthropic is now the clearest public data point on what the resulting workflow actually looks like at scale. The structural claim β€” throughput gains shift the bottleneck to requirements, design, and review β€” is consistent with what Boris Cherny called 'agentic engineering' at the May 8 Code with Claude conference. The 'dreaming' feature is the first Anthropic product mechanism explicitly addressing the evaluation-drift problem the /goals release quantified at 43%.

Verified across 2 sources: Business Honor · Techstrong AI

Open-Source Open Design Hits 40K Stars in Two Weeks as Free Claude Design Alternative

An open-source project (nexu-io/open-design) accumulated ~40,000 GitHub stars in roughly two weeks by offering a locally-runnable, bring-your-own-API-key alternative to Anthropic's subscription-gated Claude Design. The tool auto-detects 16 AI coding agents and orchestrates design generation with no usage limits, explicitly responding to friction with Claude Design's token caps on the $20/month Pro tier.

Star counts can mislead, but the velocity here is a signal about where the friction actually is: not capability, but pricing and lock-in. Claude Design launched April 17; an open-source competitor has 40K stars by mid-May. For product teams operating under data-handling constraints or skeptical of per-seat AI billing, the local-first pattern is becoming a legitimate alternative path β€” and the same pattern is showing up across the coding-agent stack.

Verified across 1 sources: TechTimes

AI Developments

Vercel Labs Ships Zero: A Systems Language Whose Compiler Talks to Agents

Vercel Labs released Zero v0.1.1, an experimental systems language whose compiler emits structured JSON diagnostics with stable error codes and typed repair metadata β€” eliminating the need for agents to parse human-readable error messages. Sub-10 KiB native binaries, capability-based I/O for explicit effects.

Zero is the first language design that treats AI agents as a first-class compiler consumer rather than humans. The bet is structural: if agents are going to read, repair, and ship code, parsing English error messages is wasted entropy. Whether Zero itself takes off or not, the design pattern β€” machine-readable diagnostics as a language feature β€” is going to spread. Worth watching as a leading indicator for how other toolchains restructure for agent consumption.

Verified across 1 sources: Marktechpost

NVIDIA Releases SANA-WM: Minute-Scale 720p Video Generation on a Single GPU

NVIDIA released SANA-WM, a 2.6B-parameter open-source Diffusion Transformer that generates 60-second 720p video sequences with metric-scale 6-DoF camera control on a single GPU. Hybrid Gated DeltaNet + softmax attention keeps memory footprint constant regardless of sequence length; dual-branch camera control and a two-stage refinement pipeline handle temporal consistency. Fits on a single RTX 5090.

World models were previously stuck behind multi-GPU clusters or forced to trade resolution for length. SANA-WM compresses that into a single consumer-tier GPU, which makes on-device simulation viable for robotics planning, embodied AI, and any product loop that wants to generate-and-evaluate visual sequences without round-tripping to a datacenter. Open-source plus single-GPU is a meaningful inflection for builders doing physical-product simulation.

Verified across 1 sources: Marktechpost

Newport Beach & Orange County

Newport Beach Rents Rise 3.7% Even as Two-Thirds of SoCal Cools

April 2026 ApartmentList data shows rents fell year-over-year in 63% of Southern California's 54 tracked cities β€” but Aliso Viejo, Newport Beach, and Mission Viejo are three of the five cities with the largest rent increases. Newport Beach one-bedroom rents rose 3.7% YoY to $2,851. Separately, the Orange County Community Foundation released its inaugural 2026 Economic Opportunity Report projecting healthcare (49,771 new jobs by 2035), tourism (29,736), and aerospace/defense/tech (8,000) as growth sectors.

Coastal OC is diverging hard from the rest of Southern California's rental market β€” demand pressure isn't easing where Clark spends time, even as inland cities soften. Paired with the OCCF report's bet on tourism and aerospace as job-growth pillars, the regional thesis is hardening: Newport-adjacent affordability isn't getting better, and the workforce inflows feeding the local economy are concentrated in sectors that need people on-site.

Verified across 2 sources: San Diego Union-Tribune · Los Angeles Times

AI Supply Chain & Logistics

Figure AI's Humanoid Sorts 100K+ Packages in 81-Hour Livestream β€” Roboticists Push Back

Figure AI's second high-visibility livestream in a week: humanoid robot Jim sorted 101,391 packages over 81 continuous hours on the Helix-02 stack, ~3–4 seconds per package, no teleoperation. Follow-on from last Friday's 47K/38-hour run, and again drew public criticism from roboticist Ayanna Howard on precision placement, exception handling, and barcode-orientation errors. Mujin published a Trusco Nakayama case study (500 cases/hour, <0.05% error rate, six-week deployment) explicitly positioning against humanoid-demo marketing; Symbotic's 2.23B cases and Brightpick's 50-robot 3,500-picks/hour deployment from earlier this week set the same contrast.

Two consecutive high-visibility humanoid demos and two consecutive public pushbacks from people who deploy this stuff for a living. The signal isn't that humanoids don't work β€” it's that the gap between livestream throughput and warehouse-grade reliability remains the unsolved problem, and incumbents in fixed-position automation (Mujin, Symbotic, Brightpick) are now actively competing on 'boring works.' For anyone building physical-digital product systems, watch which framing wins the next 12 months of enterprise procurement.

Verified across 2 sources: SE Daily · TipRanks

Maersk Launches AI-Driven Multi-Carrier Parcel Platform; Penske Ships Unified Logistics Dashboard

Maersk launched Maersk Parcel, combining ocean shipping with last-mile delivery via national/regional carrier partners, using historical forecast data and AI agents to anticipate volume spikes and dynamically reroute when carriers fail β€” no shipper action required. Penske Logistics released Supply Chain Insight, a unified dashboard pulling transportation, warehousing, and partner data into one view with a natural-language AI assistant and 100+ customizable metrics.

Both are working deployments of the pattern that's quietly becoming the supply-chain default: AI as the orchestration layer abstracting away multi-system fragmentation, with humans interacting through natural-language queries rather than 6 different operational consoles. The interesting design question is no longer 'can AI do logistics' but 'what does the operator-facing UI look like when the underlying system is making thousands of routing decisions per hour.'

Verified across 2 sources: EuropeSays (via FreightWaves) · DC Velocity

Design Engineering

Lockheed Scales Metal Additive Manufacturing with Generative Design for Hypersonic Thermal Components

Lockheed Martin is scaling laser powder bed fusion at its 16,000 sq ft Texas additive facility (opened 2024) to produce thermal management components for next-gen aircraft, hypersonic systems, and electric propulsion. Partnership with nTop's generative design tools is reportedly delivering 15-20% weight reduction and 10-15% improvement in heat dissipation efficiency. Live production, not pilot.

This is exactly the digital-physical bridge worth tracking: parametric/generative software (nTop) feeding directly into metal AM as a primary production pathway, not a prototyping shortcut. The aerospace adoption curve is the leading indicator β€” once the regulated, conservative end of manufacturing commits to layer-by-layer fabrication for mission-critical components, the pattern reaches consumer hardware on a predictable lag. Pairs naturally with this week's Caracol monolithic carbon-fiber tool and Siemens-Xometry CAD integration.

Verified across 1 sources: Machinery Market

Spokane & North Idaho

The Spokesman-Review Clears $2M Bar to Become Community-Owned Nonprofit

The Comma Community Journalism Lab nonprofit cleared its fundraising target β€” $1M cash plus $1M+ in committed donations β€” triggering a $2M matching grant from the Cowles family. A 90-day transition begins to convert The Spokesman-Review into one of the country's first community-owned daily newspapers, funded through subscriptions, philanthropy, and advertising.

Spokane is about to run a live experiment that other regional-newspaper cities are watching closely. The hybrid funding model β€” community ownership with philanthropic backing and retained commercial revenue β€” is the most ambitious structural answer yet to the regional-paper collapse problem. If it sticks past the 90-day transition, expect it to become the reference architecture other Cowles-scale family-owned dailies use.

Verified across 1 sources: The Spokesman-Review

Spokane Airport Begins Central Hall Construction; Three Residential Projects File Plans

Spokane International Airport is closing hourly parking and redirecting traffic to begin building a Central Hall Facility connecting all three concourses post-security for the first time, consolidating baggage claim and security checkpoints. Disruptions expected through early 2027. Separately, developers filed plans for a 47-unit four-story apartment complex (Threshold) near Liberty Park ($3.5M), a 95,000 sq ft McKinstry fabrication facility expansion in Airway Heights ($7.2M), and 10-unit Buth Townhomes near Esmeralda Golf Course ($1.3M).

The airport project is a meaningful structural upgrade to regional connectivity β€” and McKinstry's second prefab facility in Airway Heights signals continued bet on the Inland Northwest as an advanced-manufacturing hub. Read alongside Old Dominion's $10.5M Pasco freight hub from earlier this week, the Eastern Washington logistics and industrial footprint is visibly thickening even as Spokane wrestles with budget pressure elsewhere.

Verified across 2 sources: The Spokesman-Review (airport) · The Spokesman-Review (The Dirt)

Iran Conflict

Iran Formalizes Hormuz Toll System; Drone Strike Hits UAE Barakah Nuclear Plant

Iran's Parliament National Security Committee formalized what last week's China-vessel transit made visible: a stated 'maritime insurance policy' requirement for Hormuz passage, with the strait closed to US and Israeli vessels. ISW documents parallel overland/rail route buildout via China, Pakistan, and Iraq β€” the alternative-corridor strategy previewed last week is now confirmed infrastructure investment. New escalation: a drone strike caused a fire at UAE's Barakah Nuclear Power Plant on May 17 amid the nominal Israel-Lebanon ceasefire extension (45 days, security talks May 29). US negotiators placed five conditions on Iran including transfer of 400kg enriched uranium and reduction to one operational nuclear facility; Iran's foreign minister said Tehran 'cannot trust the Americans at all.' Trump shifted to a 20-year moratorium demand, abandoning the permanent-ban position.

The managed-access doctrine has now cleared three milestones in eight days: operational practice (China vessels), ISW confirmation of 90% underground storage and 70% mobile launcher restoration, and today's parliamentary codification. The Barakah strike is the first attack on a Gulf nuclear facility in this conflict β€” a meaningful escalation threshold, not a continuation of the tanker-seizure pattern. The Trump moratorium concession contradicts CENTCOM's 90%-degraded Senate testimony from Thursday; the gap between the public degradation narrative and the classified 70%-intact assessment is now driving the negotiating position, not just the intelligence picture.

Verified across 4 sources: Institute for the Study of War · Al Jazeera · NDTV · Missile Strikes

OSINT & Intelligence

California Justice Watch Publishes MCP Server for Criminal-Justice Accountability Data

California Justice Watch released a public, read-only MCP server exposing 15 public-record databases β€” district attorneys, public defenders, judges, officer misconduct (Brady/Giglio), and Commission on Judicial Performance discipline records. Over 6,000 records, each with source_url pointing to the canonical public document. Drops into Claude, ChatGPT, Cursor, and other MCP-aware clients.

This is what a deliberately AI-consumable public-records pipeline looks like: structured, cited, and hallucination-resistant by design. The pattern matters beyond California β€” every state has equivalent accountability data sitting behind manual search portals, and the MCP+OpenAPI combination is the cleanest model anyone has shipped for exposing it to investigative agents without scraping. For OSINT and journalism workflows, this is a reference architecture; for AI builders, it's a template for trustworthy public-data integration.

Verified across 1 sources: GitHub / California Justice Watch


The Big Picture

Desktop apps are the new battleground for coding agents OpenAI's Codex app and GitHub's Copilot desktop both shipped this week, joining Cursor and Claude Code. The center of gravity is moving from inline-suggestions-in-editor to standalone orchestration surfaces with multi-agent fanout, session history, and PR review. The IDE is becoming one window among many.

Benchmarks are now too close to decide on; token economics will Claude Opus 4.7 and GPT-5.5 trade wins on SWE-bench Pro vs Verified by single-digit points. The real differentiator surfacing in practitioner writeups: Claude burns 3-4x more tokens for equivalent work. At scale, that's the decision.

Humanoid demos vs. roboticist reality Figure's 81-hour livestream is the second high-visibility humanoid-sorting run in a week, and the second to get publicly questioned by working roboticists on precision and exception handling. The gap between marketing throughput claims and deployment-grade reliability is becoming a recurring beat β€” see also Mujin explicitly positioning against hype.

Hormuz is becoming a formalized toll regime, not a chokepoint crisis Iran's 'maritime insurance' scheme moves from rumor to parliamentary announcement. The doctrine documented last week (managed access for compliant nations) is hardening into stated policy while alternative overland routes through China, Pakistan, and Iraq get built out in parallel.

Public data is being re-plumbed for AI agents California Justice Watch's MCP server, Vercel's Zero language with machine-readable diagnostics, and Angular's WebMCP (from last week) all point the same direction: infrastructure is being redesigned so agents can consume it natively, with source citations, instead of scraping HTML or parsing error text.

What to Expect

2026-05-18 Spokane City Council Urban Experience Committee meets to vote on delaying HEART behavioral-health funding to spring 2027.
2026-05-19 Kootenai County GOP precinct committeeman primary β€” 74 seats decide control of the North Idaho Republican apparatus.
2026-05-28 Huntington Beach deadline to remedy housing-element violations before $50,000/month penalties begin June 1.
2026-05-29 Israel-Lebanon security talks on Hezbollah disarmament begin; political talks reconvene June 2-3.
2026-06-15 Anthropic's restructured Claude subscription credits take effect; agentic-tool usage moves into separate metering.

Every story, researched.

Every story verified across multiple sources before publication.

🔍

Scanned

Across multiple search engines and news databases

563
📖

Read in full

Every article opened, read, and evaluated

123

Published today

Ranked by importance and verified across sources

14

β€” The Anvil

πŸŽ™ Listen as a podcast

Subscribe in your favorite podcast app to get each new briefing delivered automatically as audio.

Apple Podcasts
Library tab β†’ β€’β€’β€’ menu β†’ Follow a Show by URL β†’ paste
Overcast
+ button β†’ Add URL β†’ paste
Pocket Casts
Search bar β†’ paste URL
Castro, AntennaPod, Podcast Addict, Castbox, Podverse, Fountain
Look for Add by URL or paste into search

Spotify isn’t supported yet β€” it only lists shows from its own directory. Let us know if you need it there.