🧯 The Staff Safety Desk

Monday, May 18, 2026

6 stories

Generated with AI from public sources. Verify before relying on for decisions.

🎧 Listen to this briefing or subscribe as a podcast →

Today on The Staff Safety Desk: the recurring shape of code that looks right and isn't. Agents that pass tests without using the argument they added, Django transactions that fire emails before commit, webhooks that report success while the worker silently fails β€” and an NGINX CVE being exploited in the wild to keep the abstract problems honest.

Cross-Cutting

Three months of vibe-coding produces complexity-58 Django code β€” quality gates have to exist before the agent runs

Max Krivich spent three months building a Django side project with an AI agent and looked up to find 3,000 lines, cyclomatic complexity over 58, duplicated helper functions, and tangled control flow. Root cause: the agent optimized locally per task, copied code instead of extracting shared logic, and appended new branches rather than refactoring. Fix is a tiered set of machine-readable gates β€” ruff, mypy, complexipy, vulture, pytest with coverage β€” configured *before* the first generated line, not bolted on after.

Confirms what Tschimev's 1inch writeup and the Lightrun 43%-fail-in-production data already implied: human review does not scale to agent throughput, and the only durable fix is deterministic gates the agent has to pass on every iteration.

Verified across 2 sources: dev.to / Max Krivich · dev.to / Mitko Tschimev (1inch)

AI Slop & Review Patterns

Coding agent adds an argument, writes tests, never uses it β€” mocks matched on anything

A developer asked an agent to thread a new argument through method signatures and call sites. The agent updated signatures, wrote tests, updated mocks β€” and the linter caught that the argument was never actually used inside the function body. Tests passed because the mocks were configured to match on any value, not the specific argument the agent claimed to have threaded. This is a concrete instance of the 'plausible diff vs. correct diff' failure the five-pass review loop's invariant-check pass is designed to surface β€” specifically the pass that checks whether external contracts actually hold, not just whether tests return green.

This is the canonical 'plausible diff vs. correct diff' failure in one paragraph β€” keep linters and 'does this symbol have any real readers?' checks as non-negotiable gates on agent output.

Verified across 1 sources: lmika.org

Django & Python Ecosystem

Django transaction.atomic() ships the email before the row commits β€” five ordering traps reviewed

A walkthrough of five concrete traps in Django's atomic context manager, opening with the canonical failure: a confirmation email or Celery task fires inside the atomic block, the transaction then rolls back, and the customer hits a 404 on the order they just 'bought' because the row never committed. The fix is transaction.on_commit() for every external side effect β€” emails, webhook dispatch, cache invalidation, Stripe calls β€” and treating the atomic block as 'no I/O until COMMIT lands.'

This is the exact ordering bug AI-generated diffs ship constantly β€” the happy path looks fine, the rollback path lies to the user, and tests that only assert 200 will never catch it.

Verified across 1 sources: Medium / Anas Issath

Web App Security Literacy

NGINX Rift (CVE-2026-42945, CVSS 9.2) under active exploitation β€” DoS is trivial, RCE needs ASLR off

A heap buffer overflow in ngx_http_rewrite_module affecting NGINX 0.6.27–1.30.0 and Plus R32–R36 is being exploited in honeypots as of May 16, three days after PoC release. Trigger is a specific config pattern: a rewrite directive with an unnamed PCRE capture ($1, $2) and a ? in the replacement string, followed by another rewrite/if/set. An escaping flag persists across a length calculation, causing a buffer sized for raw bytes to overflow when written with escaped ones. RCE requires ASLR disabled (non-default on modern distros); DoS via worker crash is reliable on default configs. Patches: 1.31.0 / 1.30.1 OSS, R36 P4 / R32 P6 Plus.

Audit your rewrite directives for unnamed captures plus '?' in the replacement today, verify ASLR is enabled, and check Kubernetes ingress controller image versions separately β€” the host NGINX version is irrelevant if the controller image ships a vulnerable build.

Verified across 4 sources: The Hacker News · SecurityAffairs · HelpNetSecurity · Vulert

Webhooks & Payments Integrations

Supabase publishes webhook debugging guide for the failure mode where the UI says 'sent' and pg_net silently timed out

Supabase's new troubleshooting guide walks through detecting pg_net background worker failures, timeout regressions introduced in 0.10.0+, and how to inspect actual HTTP request/response logs versus what the dispatcher UI reports. The named failure mode: the sending application shows the webhook as 'sent' while the receiving endpoint never got the payload β€” silent data loss in payment reconciliation, signature flows, and async pipelines.

For any DocuSeal or Coinbase Commerce integration the same shape applies β€” never trust a 'sent' or 'paid' toast that isn't backed by a stored upstream response and a reconciler that checks status independently of the dispatch path.

Verified across 1 sources: Supabase Docs

GitHub Actions & Supply Chain

Shai-Hulud source is public β€” four npm typosquats deployed within 24 hours, Renovate ships Poetry age-gating for transitive deps

TeamPCP open-sourced the Shai-Hulud worm after the May 13–14 wave hit 170+ packages including TanStack and mistralai. Within 24 hours four copycat npm packages (chalk-template, axios-utils, and two variants) deployed unobfuscated Shai-Hulud clones plus SSH-key and cloud-credential stealers, reaching 2,678 weekly downloads before takedown. Renovate PR #43429 now adds POETRY_SOLVER_MIN_RELEASE_AGE so Poetry's solver enforces age constraints on transitive deps during lock regeneration β€” closing the gap where age-gated direct deps still pulled in brand-new transitives. The source-to-copycat cycle ran in under 24 hours, which is the new baseline for how fast the offensive side operates once tooling is public.

The TanStack/Mini Shai-Hulud posture this reader has been tracking β€” release-age cooldowns, dependency pinning, CI cache isolation β€” now needs to extend to transitive deps explicitly. Renovate's PR closes that gap for Poetry users. The 24-hour copycat window also sets a concrete SLA for how fast a minimum-release-age gate needs to propagate across your lockfiles after a campaign goes public.

Verified across 4 sources: Cryptika · BankInfoSecurity · GitHub / renovatebot PR #43429 · Synrese


The Big Picture

The success signal is lying again β€” this time in three different layers Today's stories share one shape: an upstream operation failed, but the layer above reports success. The agent threaded an argument that was never used but tests passed because the mock accepted any value. Django's atomic block fired the confirmation email before the row committed, so the customer saw a 404 on the order they just 'bought.' Supabase's pg_net webhook UI reports 'sent' while the background worker times out. Three different stacks, same architectural lie: a callable returned without raising, therefore the caller believes the contract held.

Quality gates have to exist before the first AI-generated line, not after Krivich's three-month vibe-coding postmortem (3,000 lines, cyclomatic complexity 58, duplicated helpers everywhere) and Tschimev's 1inch review writeup converge on the same conclusion the Lightrun and SWR-Bench data implied: agents optimize locally per task, never refactor, and copy-paste branches rather than extract. Ruff, mypy, complexipy, and a real test runner configured before the agent runs are the only feedback loop that prevents structural debt from compounding invisibly. Reviewer attention does not scale to agent throughput.

Supply-chain risk is now agent-shaped Shai-Hulud's source code is open-source as of this week, four typosquats appeared within 24 hours, and AI coding agents that auto-install dependencies into environments holding PyPI tokens, GitHub PATs, and cloud credentials are the new acceleration vector. Renovate just shipped Poetry minimum-release-age enforcement on transitive deps; Dependabot is proposing community dependency groups. The defensive layer being built right now is age-gating and human approval for agent-driven installs β€” both are direct responses to how fast the offensive side now operates.

What to Expect

2026-05-25 Suggested ASLR audit + rewrite-directive grep window for any NGINX 0.6.27–1.30.0 still in production β€” exploitation in honeypots has been running since May 16.
2026-06-12 macOS deadline to update ChatGPT Desktop, Codex, and Atlas after OpenAI's code-signing certificate rotation following the TanStack Shai-Hulud compromise.
2026-10-31 Python 3.10 and 3.11 both reach end of life on the same day β€” two adjacent minor versions going dark simultaneously. Plan the 3.12 migration before Q4 freezes.
2026-Q3 Watch for follow-on Shai-Hulud variants now that the source is public β€” copycat tooling is already shipping typosquats within 24h of code release.

Every story, researched.

Every story verified across multiple sources before publication.

🔍

Scanned

Across multiple search engines and news databases

437
📖

Read in full

Every article opened, read, and evaluated

117

Published today

Ranked by importance and verified across sources

6

β€” The Staff Safety Desk

πŸŽ™ Listen as a podcast

Subscribe in your favorite podcast app to get each new briefing delivered automatically as audio.

Apple Podcasts
Library tab β†’ β€’β€’β€’ menu β†’ Follow a Show by URL β†’ paste
Overcast
+ button β†’ Add URL β†’ paste
Pocket Casts
Search bar β†’ paste URL
Castro, AntennaPod, Podcast Addict, Castbox, Podverse, Fountain
Look for Add by URL or paste into search

Spotify isn’t supported yet β€” it only lists shows from its own directory. Let us know if you need it there.