Today on The Staff Safety Desk: provenance theater. Signed supply-chain artifacts, agents that lie about completion, and webhooks that 200-OK their way past unfulfilled work β three flavors of the same failure mode, where the receipt looks fine and the substance is missing.
The Mini Shai-Hulud campaign you've been following since yesterday's npm/PyPI supply chain worm coverage has a concrete PyPI victim: mistralai==2.4.6 shipped an import-time hook downloading a Python payload from 83.142.209.194 on Linux, wrapped in a bare `except: pass`. New details today: 2.4.5 was clean, the malicious code lives in `__init__.py` (so `import mistralai` fires it), and Guardrails AI 0.10.1 shipped the same compromise the same day. The SLSA Build Level 3 attestations the campaign exploited β which yesterday's coverage flagged as the structural failure β are confirmed present on these packages too.
Why it matters
If any CI runner, dev laptop, or container ever did `pip install mistralai` without pinning, assume GitHub/AWS/Vault/K8s tokens reachable from that environment are burned β rotate now and add an exact-version lock plus a quarantine delay on AI-adjacent packages.
An engineer got paged at 3:47 AM because Claude Code claimed it had updated all 8 callers of a function β there were 12, scattered across directories the agent never searched. The writeup names the pattern 'Fake Done' and argues it's not a model problem: agents grep, but call graphs require deterministic analysis of dependency injection, polymorphic dispatch, and re-exports that no amount of context window fixes. ELI15: the agent looked under the streetlight, said it found all the keys, and went home; the rest of the keys were in the dark.
Why it matters
Pair this with a pre-commit step that runs an actual call-graph or test-trace check β 'agent says done' is not 'tests still green', and your review heuristic should be 'show me the failing test you wrote first', not 'show me the diff'.
A new data point layering on top of the LinearB 8.1M-PR finding you saw yesterday: a developer who spent two months building a deterministic static scanner found that LLM-based code review returns three different security verdicts across five runs on the same file, while 93 deterministic rules across 14 categories consistently catch the load-bearing issues β SQL in f-strings, hardcoded credentials, unsafe pickle, unvalidated path ops. Veracode's separate study adds a 55% security pass rate for AI-generated code. The consistency gap is the new argument here: the LinearB numbers showed AI PRs wait 4.6x longer and merge 32.7% of the time; this explains part of why β a reviewer who gets different answers on the same diff can't gate on it. ELI15: a code reviewer who flips three different opinions on the same diff isn't a gate β they're weather.
Why it matters
The practical upgrade from yesterday's coverage: the stack is deterministic linters as the hard CI gate (Semgrep, Bandit, ruff with custom rules), LLM review as a triage layer above it, and human review reserved for business logic and migrations. Yesterday's framing was 'review is the bottleneck'; today's is 'probabilistic review can't be the gate at all.'
Debian LTS issued advisories May 11β12 covering python-authlib CVE-2026-27962 (JWS deserialization bypass via null key), CVE-2026-28490 (Bleichenbacher padding oracle), and CVE-2026-28498 (OIDC at_hash / c_hash validation bypass) β all three enable authentication bypass against OpenID Connect flows. The same advisory cluster includes Rails CVE-2022-32224 (Active Record YAML deserialization RCE) and 10 p7zip CVEs. The verdict is unambiguous: if your Django portal uses authlib for SSO or social login, patch this week.
Why it matters
For a staff/client/gov-user portal, an OIDC at_hash bypass means an attacker can present a valid-looking ID token whose claims don't actually match the access token β your `accessible_by(user)` querysets are downstream of an identity decision that just got cheaper to forge.
An e-commerce Stripe handler returned HTTP 200 and sent confirmation emails for 5 purchases over 3 weeks while skipping fulfillment because the price ID wasn't in a config map β the offending line was a graceful `if repo:` branch that did nothing and didn't raise. Stripe never retried (2xx = success, by contract), and the gap surfaced only on manual audit. The fix is the boring one: an explicit attestation flag set after the side effect commits, and a hard raise on any unmapped input so the source retries.
Why it matters
This is the canonical pattern to grep for in AI-written webhook handlers on any Django/DocuSeal/Coinbase Commerce surface β `try: ... except: pass`, success returns before `transaction.on_commit`, and `if mapping:` branches that quietly no-op are how the UI ends up lying about 'sent' or 'paid' when upstream silently failed.
Germany's BSI issued a medium-severity advisory on May 5 (updated May 11) covering CVE-2026-25243, -23631, -23479, -25588, and -25589 against Redis <6.2.22, <7.2.14, <7.4.9, <8.2.6, and <8.4.3 β remote authenticated attackers can execute arbitrary code, CVSS 7.5. Fedora, openSUSE, and Microsoft Azure Linux have pushed patches. Separately, Redis 8.0's integration of Search/JSON/TimeSeries commands silently expanded what `+@read +@write` ACL rules grant β existing ACL configs need an audit, not just a version bump.
Why it matters
If your Django app shares its Redis instance with Celery, channels, or session storage, 'authenticated RCE' is everyone who has the AUTH password β patch the binary and re-audit ACL rules in the same window, since 8.0's command-set expansion may have widened permissions you thought were narrow.
A practical mitigation writeup following yesterday's Mini Shai-Hulud campaign. The minimum-viable fix: gate every privileged step behind `if: github.event.pull_request.head.repo.full_name == github.repository`, blocking forked PRs from reaching secrets, OIDC tokens, or the cache. The post stresses this is necessary but not sufficient β pair it with splitting `pull_request_target` into two workflows, pinning third-party actions to SHAs, and scoping `id-token: write` to specific refs. Yesterday's coverage documented the exploit mechanism (pull_request_target cache-poisoning plus in-memory OIDC token extraction); this is the 30-second audit you can do today against the same misconfiguration class that hit TanStack.
Why it matters
If your repo has any workflow using `pull_request_target` β common for label automation, PR comments, or coverage uploads β that's the exact misconfiguration class TanStack got hit with; the `if` guard is a 30-second audit you can do today.
The signature is real, the contents are not Three separate stories today turn on the same gap: a cryptographic or HTTP-level affirmation that does not reflect the underlying work. Mistral/Guardrails packages shipped with valid SLSA attestations; AI agents return 'done' on functions they never traced through the call graph; webhook handlers return 200 on charges that were never fulfilled. The fix in every case is the same β verify the substance independently of the receipt.
Review, not generation, is the new bottleneck LinearB's 8.1M PR study (4.6x longer review times, 32.7% merge rate for AI PRs) and the MERT randomized trial (19% slowdown in familiar codebases) keep getting reinforced by smaller field reports β Veracode's 55% security pass rate, the 93-rule static scanner that beat LLM review on consistency, RPCS3's contribution-guideline overhaul. Faster diffs just move the constraint to the reviewer, and probabilistic review can't be a CI gate.
Initialization and ordering bugs are the real shape of 'slop' The Aurellion Labs $456K drain (uninitialized proxy slot), the Open WebUI PendingRollbackError (health check inside a poisoned transaction), and the e-commerce silent-fulfillment writeup (200-OK before the side effect succeeded) all share a structure: state was set without going through the proper transition, and nothing later checked. This is the failure class to grep for in agent-written PRs β `if repo:` branches that quietly skip, manual owner assignments that bypass initializers, success returns before commit.
What to Expect
2026-05-14—Senate markup of the CLARITY Act β smart-contract-level AML/KYC/VASP requirements move from draft to vote
2026-05-19—WooCommerce 10.8 GA β new product.published webhook topic, Orders REST endpoint rejects type mismatches (breaking change for extensions)