<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Engineering Heresy]]></title><description><![CDATA[Challenging conventional wisdom in AI and software engineering.
Deep explanations, mental models, and practical heresies for building better systems with less bullshit.]]></description><link>https://geggleto.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!DPPP!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9deaf250-314e-4473-819e-7f3fa9c0b218_256x256.png</url><title>Engineering Heresy</title><link>https://geggleto.substack.com</link></image><generator>Substack</generator><lastBuildDate>Tue, 16 Jun 2026 21:58:16 GMT</lastBuildDate><atom:link href="https://geggleto.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Glenn Eggleton]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[geggleto@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[geggleto@substack.com]]></itunes:email><itunes:name><![CDATA[Glenn Eggleton]]></itunes:name></itunes:owner><itunes:author><![CDATA[Glenn Eggleton]]></itunes:author><googleplay:owner><![CDATA[geggleto@substack.com]]></googleplay:owner><googleplay:email><![CDATA[geggleto@substack.com]]></googleplay:email><googleplay:author><![CDATA[Glenn Eggleton]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[The Day a Frontier Model Got Switched Off. Security Is Now Metric #1.]]></title><description><![CDATA[A government just switched off a frontier AI model overnight. Why security &#8212; not velocity &#8212; is now your top engineering metric.]]></description><link>https://geggleto.substack.com/p/the-day-a-frontier-model-got-switched</link><guid isPermaLink="false">https://geggleto.substack.com/p/the-day-a-frontier-model-got-switched</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Tue, 16 Jun 2026 14:31:21 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/61c5cbfe-46db-4fa5-b329-87f25673a4a0_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Your most important dependency has a kill switch, and you don't hold it. Last week, a government pressed it.</em></p><div><hr></div><p>On June 12, a US export-control directive ordered Anthropic to cut off all access to Fable 5 and Mythos 5 for any foreign national &#8212; not just abroad, but inside the United States, and including Anthropic's own foreign-national employees. The company couldn't segment its users by nationality fast enough to comply, so it did the only thing that satisfied the order on the clock it was given: it turned the models off. Globally. For everyone. Within hours.</p><p>That's the event. Here's the claim of this whole post: the shutdown proves that every model you build on is two things at once &#8212; an attack surface and a geopolitical kill switch &#8212; and that means security, not velocity, is now the metric that decides whether your system survives the year. Most teams wiring frontier models into their critical path have optimized hard for speed and treated resilience as a problem for later. Later just arrived, and it arrived as a press release nobody in engineering got to veto.</p><p>I'm not writing this to dunk on Anthropic. I build on these models every day and I'll keep doing it. I'm writing it because the failure mode it exposed is one almost every engineering org is currently exposed to, and most of them don't know it yet.</p><div><hr></div><h2>Unplugged overnight</h2><p>A hosted frontier model feels like a utility. You call an endpoint, you get tokens back, the bill shows up monthly. Power, water, compute &#8212; same mental category. That mental category is wrong, and the directive is the proof.</p><p>Utilities don't get switched off by name on a Friday afternoon because of something that happened in a threat briefing you weren't in. This one did. The decision wasn't yours, wasn't your vendor's &#8212; Anthropic complied under protest &#8212; and wasn't subject to any SLA you signed. There is no line in any commercial agreement that covers "act of government." The capability was there in the morning and gone by dinner, and the only input that mattered came from a party you have no contract with at all.</p><p>If your critical path runs through a single hosted model, you have a dependency whose off-switch is held by someone you can't call.</p><p>This is the part engineers are trained to see in every other layer of the stack and somehow stopped seeing here. We wouldn't run a payments system on one provider with no fallback. We wouldn't put our whole business on one availability zone and call it resilient. But we'll route every agent, every classifier, every code-gen pipeline through one model from one vendor, hard-coded, and never ask what happens the day it returns a 403 for reasons that have nothing to do with us. We asked that question about databases twenty years ago. We haven't asked it about models.</p><div><hr></div><h2>The genie is already out of the bottle</h2><p>The official logic of the shutdown is containment: a dangerous capability was discovered, so access to it gets restricted. That logic only works if the capability lives in one place. It doesn't.</p><p>By the current measures &#8212; an analysis by H&#229;vard Tveit Ihle and colleagues on LessWrong, built on Epoch AI's benchmarking data, is the clearest &#8212; open-weight models trail the closed frontier by roughly six months. Six months. That's the lead. That's the entire moat the containment argument depends on.</p><p>You cannot switch off a weight file that has already been downloaded ten thousand times. You cannot issue an export-control directive to a model running on someone else's hardware in a jurisdiction that has no reason to honor it. A control that only works inside the borders of the country issuing it, against capability that exists in every other country within months, isn't a defense. It's a structural weakness with a press release attached &#8212; it constrains the defenders who comply and does nothing to the adversaries who don't.</p><p><strong>Cutting off frontier capability doesn't remove it. It just decides who gets to keep using it, and the answer is rarely the people you'd pick.</strong></p><p>Europe read it exactly this way. France's Bruno Retailleau put it bluntly: "a nation that depends on others for its technology is a nation that can be unplugged overnight." The political response there wasn't "let's get access back." It was "let's stop being dependent" &#8212; a hard pivot toward Mistral and homegrown capability. Whatever you think of the geopolitics, the engineering instinct underneath it is correct: a dependency you can be denied at someone else's discretion is a liability you have to design around, not a convenience you get to assume.</p><div><hr></div><h2>"The government had a real reason."</h2><p>Here's the strongest version of the case against everything I've just written, and it deserves a real answer, not a strawman.</p><p>The trigger wasn't paranoia. The government believed it had found a way to jailbreak Fable 5, and the capability in question had cyber-offensive implications. If a frontier model can be reliably turned into an exploit-generation engine, that is a legitimate national-security concern, and a regulator acting on it is not being hysterical. Anthropic complied for a reason. Steelmanned all the way: the concern was real, the stakes were real, and reasonable people staffed that decision.</p><p>Grant all of it. The lesson doesn't move.</p><p>Because look at what the triggering capability actually was. By the reporting &#8212; Snyk's security write-up is the clearest &#8212; it was a "narrow jailbreak" around getting the model to read code and fix its vulnerabilities. That's not an exotic weapon. That's automated code review. That's the single most useful defensive thing these models do, the thing your security team wants them doing all day long. You cannot ban "read this codebase and find the flaws" without banning the exact workflow defenders depend on, because attackers and defenders run the identical query &#8212; the only difference is what they do with the answer.</p><p>So the concern can be entirely valid and the response still indicts the architecture. A safety control whose only available implementation was "make the model go dark for everyone, including every defender and the vendor's own staff" is not a precise instrument. It's a blast radius. And the thing about a blast radius is that the people standing closest to the explosion are usually the ones who were doing legitimate work.</p><p><strong>A control that can only protect you by turning the lights off for everyone is not a control. It's a single point of failure wearing a safety vest.</strong></p><div><hr></div><h2>Security is the metric that survives both</h2><p>Put the two failure modes side by side. The model can be jailbroken &#8212; that's the attack surface. The model can be revoked by policy &#8212; that's the kill switch. Same asset, two ways to lose it, and neither one is in your roadmap. The only discipline that addresses both is the one most teams have been treating as a phase-four nice-to-have: security, designed in from the start.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ObQA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94aa73c8-8bf6-4ca8-8097-65109c114895_1600x1040.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ObQA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94aa73c8-8bf6-4ca8-8097-65109c114895_1600x1040.png 424w, https://substackcdn.com/image/fetch/$s_!ObQA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94aa73c8-8bf6-4ca8-8097-65109c114895_1600x1040.png 848w, https://substackcdn.com/image/fetch/$s_!ObQA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94aa73c8-8bf6-4ca8-8097-65109c114895_1600x1040.png 1272w, https://substackcdn.com/image/fetch/$s_!ObQA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94aa73c8-8bf6-4ca8-8097-65109c114895_1600x1040.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ObQA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94aa73c8-8bf6-4ca8-8097-65109c114895_1600x1040.png" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94aa73c8-8bf6-4ca8-8097-65109c114895_1600x1040.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ObQA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94aa73c8-8bf6-4ca8-8097-65109c114895_1600x1040.png 424w, https://substackcdn.com/image/fetch/$s_!ObQA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94aa73c8-8bf6-4ca8-8097-65109c114895_1600x1040.png 848w, https://substackcdn.com/image/fetch/$s_!ObQA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94aa73c8-8bf6-4ca8-8097-65109c114895_1600x1040.png 1272w, https://substackcdn.com/image/fetch/$s_!ObQA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94aa73c8-8bf6-4ca8-8097-65109c114895_1600x1040.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Concretely, treating it as metric #1 changes three things.</p><p>You build for model redundancy the way you already build for region redundancy. More than one provider, an abstraction layer over the model call, the ability to fail over to an open-weight model you host yourself &#8212; not because the open model is as good, but because a degraded model you control beats a better model you can be denied. The parity gap above is exactly why this is feasible now: your fallback is about six months behind, not two or three years.</p><p>You treat AI as part of your existing attack surface instead of a magic box bolted to the side of it. Every model output that reaches a sensitive sink gets the same scrutiny as any other untrusted input, because a jailbroken model is an untrusted input. The defensive disciplines for this already exist &#8212; it's the same resilience thinking we apply to any distributed system. A model provider is just another dependency that can fail, partition, or lie, and we already know how to engineer around dependencies that do.</p><p>And you stop shipping AI-generated code you never threat-modeled. The velocity these models give you is borrowed against a security debt that comes due the first time generated code hits production with a flaw nobody reviewed. Speed that you can't secure isn't an asset. It's leverage pointed at your own foot.</p><p>None of this is exotic. It's the resilience engineering we already do for every other critical dependency, finally applied to the one we've been pretending is a utility.</p><div><hr></div><h2>What this costs you if you ignore it</h2><p>The teams most exposed right now are the ones for whom last week was a non-event &#8212; the ones who felt the headlines, noted that their region still had access, and moved on. Their dependency didn't get switched off this time. The architecture that left them one directive away from an outage is completely intact, and the next trigger doesn't have to be a jailbreak. It can be an export rule, a sanctions list, a licensing dispute, a model deprecation, a provider that simply decides your use case isn't worth the liability.</p><p>The cost of the conventional wisdom &#8212; one model, no fallback, security as a checklist you get to at the end &#8212; is an outage you cannot engineer your way out of after it lands, because the time to build the fallback was before you needed it. The shutdown didn't create that risk. It just sent everyone an invoice for it, and most teams are going to file it under "interesting" and pay it later at a much worse exchange rate.</p><p>Velocity got us here. It won't get us through what's next. The metric that does is the one we've been deferring.</p><p>Am I wrong about this? If you've already built real model redundancy &#8212; actual failover to a second provider or a self-hosted open-weight model, not a config flag you've never tested &#8212; I want to hear how it's holding up, and what it cost you to build before you needed it. And if you think I'm over-rotating on one directive, tell me why. I'm reading every comment.</p><p>If this named something you've been feeling but hadn't put words to, subscribe below &#8212; I write up what building resilient agentic systems teaches me, usually by going wrong in production first.</p><p><em>Subscribe on Substack</em></p><p>&#8212; Glenn Eggleton builds agentic engineering systems and writes about what survives contact with production.</p><p>&lt;details&gt;</p><p>&lt;summary&gt;SEO meta&lt;/summary&gt;</p><p>Title (&#8804;60 chars): A Government Switched Off an AI Model. Now What?</p><p>Meta description (&#8804;155 chars): A government just switched off a frontier AI model overnight. Why that makes security &#8212; not velocity &#8212; your top engineering metric.</p><p>URL slug: ai-model-kill-switch-security</p><p>Primary keyword: AI model dependency risk</p><p>Secondary keywords: frontier model export controls, AI supply chain security</p><p>Search intent: informational</p><p>&lt;/details&gt;</p>]]></content:encoded></item><item><title><![CDATA[I Got Tired of AI Code Review Noise, So I Built a Ratchet]]></title><description><![CDATA[My four LLM reviewer agents kept re-discovering the same findings, run after run. The fix: tier the checks, ledger the noise, and promote what&#8230;]]></description><link>https://geggleto.substack.com/p/i-got-tired-of-ai-code-review-noise</link><guid isPermaLink="false">https://geggleto.substack.com/p/i-got-tired-of-ai-code-review-noise</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Thu, 11 Jun 2026 13:47:49 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/101ececb-95c0-407f-aa80-dcaf284f8375_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>LLM reviewers can flag anything. Only deterministic checks get to block.</em></p><div><hr></div><p>For months I ran four LLM reviewers on every diff that mattered: code, security, library, and an adversarial claims reviewer. Four cold-context specialists, every change. It felt rigorous, so I didn't look too hard at what they were actually producing.</p><p>When I finally did, the rigor fell apart. The reviewers were flagging things faster than I could work through them, and most of it was not new. It was the same finding from last week, worded a little differently, raised again by an agent that had no idea it had raised it before. I was paying for the same review over and over.</p><p>If your review process never turns its recurring findings into deterministic checks, it is not a process. It is noise on a loop. Let an LLM reviewer flag anything it wants. Only a deterministic check should be allowed to block a merge. Most teams wiring up AI code review right now have not separated those two things. I hadn't either.</p><div><hr></div><h2>The same finding, forever</h2><p>A human reviewer who flags the same problem three times eventually does something about it. Writes a lint rule. Updates the style guide. Says something in standup that becomes team lore. The third time costs less than the first because people remember, and remembering carries consequences.</p><p>An LLM reviewer remembers nothing. Every run starts <strong>cold</strong>. The agent that flagged a missing validation check on Tuesday flags it again on Thursday with the same confidence and no idea it is repeating itself. My cost per finding stayed flat while the value of each finding fell toward zero. That is not review. It is a subscription to my own backlog.</p><p>Volume made it worse. Four reviewers, ten-ish findings each, every day. I became the bottleneck, reading output I had stopped trusting and re-deciding things I was fairly sure I had already decided but could not prove. The false positives came back just as reliably as the real findings, with nothing to tell them apart.</p><p>The obvious fix is memory. Save the findings, feed them back to the reviewer next time, let it skip what it already said. I thought about it and dropped it, for two reasons.</p><p>The reviewers run cold on purpose. The value of a second opinion is that it has not seen the first one. Prime a reviewer with its own history and it starts agreeing with its past self instead of reading the diff.</p><p>And memory solves the wrong problem anyway. A finding that keeps coming back does not need to be remembered. It needs to be dealt with. An agent that remembers flagging the same thing three times is just an agent that remembers being ignored. The missing piece was never memory. It was consequence.</p><div><hr></div><h2>Vibes with veto power, or noise nobody reads</h2><p>There are two standard ways to handle this, and both make it worse.</p><p>The first is to make the reviewer a gate. If the LLM says stop, the merge stops. Now non-deterministic judgment has veto power over your pipeline. The same diff passes Monday and fails Tuesday because the model worried about something different that time. Engineers figure this out fast, and a gate they cannot predict is a gate they stop respecting. The distrust then spreads to everything else the agents touch. That is vibes with veto power.</p><p>The second is to make everything advisory. The reviewer comments, nobody is blocked, work continues. It feels safer and rots just as fast, because a finding with no consequence teaches everyone to scroll past it. Give it a few weeks and the advisory output is wallpaper. You are paying for tokens nobody reads.</p><blockquote><p>A finding that recurs without consequence trains you to ignore the reviewer.</p></blockquote><p>Neither failure is the model's fault. The models review fine. The problem is that neither setup separates flagging a problem from blocking on it. The gate fuses them, so every flag becomes a verdict. The advisory split cuts the wire entirely, so nothing a flag says is ever enforced. What you want is the two held apart, with a deliberate path from one to the other.</p><div><hr></div><h2>Tier the checks: flag vs. block</h2><p>So I built that path. It shipped today as <a href="https://github.com/LazyIsEfficient/agentic-os/releases/tag/v1.1.0">agentic-os v1.1.0</a>. The core is three tiers, ordered by how reproducible each one is.</p><p><strong>Tier 0 is deterministic validators.</strong> Scripts, linters, schema checks, grep rules. Anything that returns the same answer every time. This is the only tier allowed to block a merge. If a check can flake, it does not get to gate.</p><p><strong>Tier 1 is LLM judgment with evidence attached.</strong> A reviewer can push a finding up to this tier only by bringing a deterministic artifact: a failing script, a counterexample, something that exits non-zero on its own. The judgment finds the problem. The artifact is what actually gates. The argument around it does not.</p><p><strong>Tier 2 is everything else the reviewer thinks.</strong> Style, unease, "this feels wrong." Advisory, never blocking. The part that keeps Tier 2 from being pure noise is that every finding here gets recorded, fingerprinted, and counted.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CWV7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437a30b-63ac-401c-ab30-55ee3551a8a4_1600x1010.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CWV7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437a30b-63ac-401c-ab30-55ee3551a8a4_1600x1010.png 424w, https://substackcdn.com/image/fetch/$s_!CWV7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437a30b-63ac-401c-ab30-55ee3551a8a4_1600x1010.png 848w, https://substackcdn.com/image/fetch/$s_!CWV7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437a30b-63ac-401c-ab30-55ee3551a8a4_1600x1010.png 1272w, https://substackcdn.com/image/fetch/$s_!CWV7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437a30b-63ac-401c-ab30-55ee3551a8a4_1600x1010.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CWV7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437a30b-63ac-401c-ab30-55ee3551a8a4_1600x1010.png" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9437a30b-63ac-401c-ab30-55ee3551a8a4_1600x1010.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CWV7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437a30b-63ac-401c-ab30-55ee3551a8a4_1600x1010.png 424w, https://substackcdn.com/image/fetch/$s_!CWV7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437a30b-63ac-401c-ab30-55ee3551a8a4_1600x1010.png 848w, https://substackcdn.com/image/fetch/$s_!CWV7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437a30b-63ac-401c-ab30-55ee3551a8a4_1600x1010.png 1272w, https://substackcdn.com/image/fetch/$s_!CWV7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437a30b-63ac-401c-ab30-55ee3551a8a4_1600x1010.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>All four reviewers were rewired to this in the same release. A stop verdict standing on Tier 2 alone is no longer a verdict. It is a flag, and it goes to a ledger instead of to the merge button.</p><div><hr></div><h2>The ratchet: recurrence earns promotion</h2><p>The ledger is where it starts to compound. It is also where the memory lives, outside the reviewers where it cannot pollute a cold read, and attached to a consequence.</p><p>It is boring on purpose. An append-only JSONL file, one line per event, driven by a small Python script with five commands: <code>add</code>, <code>tally</code>, <code>triage</code>, <code>promote</code>, <code>retire</code>. Every unevidenced finding gets a SHA-256 fingerprint built from the file path and the normalized claim, so the same defect written two different ways across two runs lands on one entry instead of looking like two discoveries.</p><pre><code>{"fingerprint":"a3f29c41d7b08e55","file":"src/api/sessions.ts","claim":"session token compared without constant-time check","tier":2,"source":"security-reviewer","run_id":"r-0611","date":"2026-06-11","evidence":null,"status":"RECURRING"}</code></pre><p>Recurrence counts distinct runs, not raw sightings, so a reviewer repeating itself five times in one run cannot fake a trend. When a fingerprint crosses the threshold, <code>triage</code> surfaces it for a human to look at. If it is real, you encode it as a Tier 0 validator or a Tier 1 evidence script, and <code>promote</code> records it. Promote refuses to mark anything done unless the encoded check is attached. The promotion is the check. Findings nobody ever repeats age out through <code>retire</code>.</p><p>That is the ratchet: finding, ledger, tally, promote, check. Once a defect class is promoted, no LLM argues about it again. It is out of the stochastic layer for good, and the reviewers go back to looking at what is actually new in the diff.</p><blockquote><p>Once a defect class is promoted, no LLM re-litigates it. The ratchet only turns one way.</p></blockquote><p>Is any of this load-bearing yet? It caught its own first bug before release. The fingerprint normalizer was treating apostrophes in contractions as quote characters, which made unrelated findings collide into one entry. That fix shipped with a regression test in the same release. So did 33 routing collisions across the library, each one a recurring finding that got investigated and encoded instead of re-flagged forever. The system chewing on its own output is the whole idea.</p><div><hr></div><h2>"You just rebuilt lint with extra steps"</h2><p>Fair objection. If every good finding ends up as a deterministic check, have I just rebuilt my lint config the long way and thrown out the LLM judgment that was the point?</p><p>No, because of the pipeline. Lint rules show up when a human gets annoyed enough to write one. There has never been a standing path from "the reviewer keeps mentioning this" to "the machine checks this now." The ratchet is that path. It does not get rid of judgment. It retires the judgments you have already settled, so the expensive stochastic layer stays aimed at the things you have not. The reviewer stops re-finding what you already know and starts finding the next thing worth promoting. Lint never had that. That is the new part.</p><div><hr></div><h2>What skipping it costs you</h2><p>Stand up AI code review without this and you get one of the two decays: a gate your team learns to distrust, or comments your team learns to skip. Either way your spend grows in a straight line and compounds nothing. Every dollar buys the same findings the last dollar did.</p><p>With the ratchet, the curve bends. Every promoted finding is a review you never pay for again. Quality stops resetting to zero each run and starts accumulating, the way the rest of your tooling already does.</p><p>The whole thing, tiers and ledger and rewired reviewers, is open source in <a href="https://github.com/LazyIsEfficient/agentic-os">agentic-os</a>, one command to install. If the flag-versus-block split named something you have been feeling but had not put words to, star the repo. And subscribe below, where I write up what this system teaches me, usually by going wrong first.</p><p><a href="https://geggleto.substack.com">Subscribe on Substack</a></p><p>&#8212; Glenn Eggleton builds agentic engineering systems and writes about what survives contact with production.</p><div><hr></div><p>&lt;details&gt;</p><p>&lt;summary&gt;SEO meta&lt;/summary&gt;</p><ul><li><p><strong>Title (&#8804;60 chars):</strong> I Got Tired of AI Code Review Noise, So I Built a Ratchet</p></li><li><p><strong>Meta description (&#8804;155 chars):</strong> My AI code reviewers kept re-discovering the same findings every run. So I built a tier-and-ratchet mechanism that makes review quality compound.</p></li><li><p><strong>URL slug:</strong> ai-code-review-noise-ratchet</p></li><li><p><strong>Primary keyword:</strong> AI code review</p></li><li><p><strong>Secondary keywords:</strong> LLM code review false positives, AI code review workflow</p></li><li><p><strong>Search intent:</strong> informational</p></li></ul><p>&lt;/details&gt;</p>]]></content:encoded></item><item><title><![CDATA[AI Made Engineers Faster. It Also Made Teams Slower to Integrate.]]></title><description><![CDATA[AI made every engineer faster and your team slower to ship. The velocity is on the dashboard. The collaboration tax that pays for it isn't, and I&#8230;]]></description><link>https://geggleto.substack.com/p/ai-made-engineers-faster-it-also</link><guid isPermaLink="false">https://geggleto.substack.com/p/ai-made-engineers-faster-it-also</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Wed, 10 Jun 2026 15:12:50 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1e6beeb9-7f4d-4cd6-8da9-7f4c9c818b84_1600x836.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>One engineer can now build the whole feature alone. That's the win. It might also be the problem.</em></p><div><hr></div><p>An engineer on a team I know shipped an entire feature last month (three services, a new queue, a schema change, the lot) in about four days. Alone. No design meeting, no integration huddle, no "can you walk me through how the payment service expects this." Just one person and a pile of agents, moving at a speed that would have taken a small group two weeks in the before times.</p><p>Then deploy day came, and everything stopped.</p><p>Not because the code was wrong. The code was fine. It stopped because the moment that feature had to leave that one engineer's head &#8212; to a devops person who owns the pipeline, to a reviewer who has to actually understand it, to the on-call rotation that will get paged when it breaks &#8212; nobody else had the map. The whole cross-service model existed in exactly one brain, and there was no cheap way to get it into a second one.</p><p>Here's the thing I keep circling back to, and the claim of this whole post: <strong>the velocity AI gives one engineer is borrowed against a collaboration tax the entire team repays at integration and deploy. And most of us haven't noticed the debt yet.</strong> The speed is real. It shows up on every dashboard we have. The tax is just as real, and it shows up nowhere, until the bill arrives at the worst possible moment.</p><p>I want to be honest up front: I don't have this solved. This is me thinking out loud about a pattern I keep seeing, in my own work and in teams I talk to. I'm more sure the problem is real than I am about anything we should do about it.</p><div><hr></div><h2>The solo end-to-end build is real, and it's genuinely fast</h2><p>Let's start with the part that isn't a complaint, because it's important not to wave this away as hype.</p><p>The thing that's changed is the <em>scope</em> one person can hold. It used to be that a feature crossing three services crossed at least three people: somebody who knew the auth service, somebody who owned the data layer, somebody who lived in the front end. The boundaries between systems were also the boundaries between humans. You coordinated across services because you had to coordinate across people, and the coordination was the work.</p><p>Agents collapse that. One engineer can now open all three services at once, hold the full call path in working memory, and let the harness do the typing across every boundary at the same time. The auth change, the queue consumer, the migration, the client update, all built together in one session, by one person who never had to schedule a conversation to make it happen.</p><p>And it's fast. Not "feels fast." Measurably fast. The work that used to be gated on three calendars is now gated on one engineer's afternoon. If your only instrument is throughput, this looks like an unqualified win, and I understand why every engineering leader in the industry is leaning into it. I'm leaning into it. The output is real.</p><p>That's exactly what makes the rest of this hard to see.</p><h2>The whole model now lives in one brain</h2><p>When one person builds across three services in an afternoon, something quiet happens: the complete mental model of how those pieces fit together &#8212; why the queue retries the way it does, which failure the migration is guarding against, what the client assumes about the auth response &#8212; now exists in precisely one place. One head.</p><p>In the old world, that model was distributed whether you liked it or not. Three people built it, so three people held pieces of it, and the act of integrating forced them to reconcile their pieces out loud. The knowledge was spread across the team as a side effect of the work being spread across the team. Nobody designed it that way; it was just how building together worked.</p><p>The solo end-to-end build removes the side effect. The feature gets built, and the understanding of it doesn't spread, because spreading it was never required to ship it. You end up with a new kind of silo: not an organizational one, where a team hoards what it knows, but a structural one, where the knowledge was simply never externalized in the first place. It's invisible precisely because the building phase feels so good. One person, fully loaded with context, is the most productive unit in software. The problem is that productivity and resilience are pulling in opposite directions, and only one of them is on the screen.</p><p>Now, the obvious objection (and it's a good one) is <em>that's what documentation is for.</em> Write it down. Have the agent generate the design doc. Drop a markdown file next to the feature explaining every decision. We have better tooling for this than we've ever had; the model that built the thing can also describe it.</p><p>I used to find that answer fully convincing. I find it less convincing now, for two reasons.</p><p>The first is that a document encodes facts, not judgment. It can tell you the queue retries three times with exponential backoff. It struggles to tell you <em>why three and not five</em>, what got tried and rejected, which production incident from two years ago is the reason that number exists at all. The tacit reasoning &#8212; the part that's actually expensive to rebuild &#8212; is the part that's hardest to write down and easiest to leave out. And the model writing the doc doesn't know it either, unless the engineer thought to say it.</p><p>The second is that reading isn't free. A document doesn't transfer understanding; it transfers the <em>opportunity</em> to rebuild understanding, and the reader still has to pay for that with their own time and attention. A 4,000-word design doc that took an agent ninety seconds to produce can cost a reviewer an hour to genuinely absorb, and they still can't interrogate it the way they could interrogate a colleague. You can't ask a markdown file "wait, what happens if the migration runs while the old consumer is still up?" and watch its face change as it realizes it hadn't thought about that.</p><p>Docs help. I'm not anti-doc. But they move the cost; they don't remove it. And critically, they move it <em>downstream</em>: from the fast, cheap building phase to the slow, expensive handoff phase. Which is exactly where the bill is waiting.</p><h2>The bill comes due at integration and deploy</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hCAE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1478106d-7aec-4a66-8ba1-6589a09c3879_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hCAE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1478106d-7aec-4a66-8ba1-6589a09c3879_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!hCAE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1478106d-7aec-4a66-8ba1-6589a09c3879_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!hCAE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1478106d-7aec-4a66-8ba1-6589a09c3879_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!hCAE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1478106d-7aec-4a66-8ba1-6589a09c3879_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hCAE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1478106d-7aec-4a66-8ba1-6589a09c3879_1600x900.png" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1478106d-7aec-4a66-8ba1-6589a09c3879_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hCAE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1478106d-7aec-4a66-8ba1-6589a09c3879_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!hCAE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1478106d-7aec-4a66-8ba1-6589a09c3879_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!hCAE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1478106d-7aec-4a66-8ba1-6589a09c3879_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!hCAE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1478106d-7aec-4a66-8ba1-6589a09c3879_1600x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Watch where the friction actually lands now. It's not in the building. Building is the part that got fast. The friction migrated to the seams: the places where the work has to pass from the one person who holds the whole picture to everyone else who needs a piece of it.</p><p>Deploy is the sharpest seam. The engineer who built the feature understands its rollout implicitly: this migration runs before that service restarts, this queue drains before the old consumer dies, this flag flips last. None of that is in the code in a form the pipeline owner can read off. So now there's a conversation. But not the cheap, in-flight kind that used to happen while two people built a thing together. It's an after-the-fact download: four days of dense, agent-assisted context, handed to someone who has to receive it cold, all at once, under deploy-day pressure.</p><p>Code review has the same shape. A reviewer faced with a three-service change built in one pass isn't reviewing a diff anymore: they're reverse-engineering an entire mental model from its artifacts, fast, so the thing can move. Multiply that by every solo-built feature in the queue and you get a review process that's quietly become the bottleneck the building used to be. We didn't remove the constraint. We moved it from "writing the code" to "transferring the context," and the second one is harder to parallelize because it lives in people, not in machines.</p><p>This is the tax: slower deploys, longer reviews, more coordination overhead exactly when you can least afford it. It's close to invisible on the instruments, because nobody books "the hour it took to explain the feature to devops" as a cost of the feature. It just shows up as deploys dragging a little, for reasons nobody quite names.</p><p>And there's a sharper edge to it: bus factor. When the complete model of a critical feature lives in one head and was never forced out into the team, that head going on vacation (or leaving) isn't a staffing inconvenience. It's a genuine hole in the system's operability that you discover at 2 a.m. when the thing breaks and the one person who understands it is unreachable.</p><h2>"We just throw PRs at each other and point our agents at them"</h2><p>Here's the part that worries me most, because it's not about deploy speed. It's about what we're becoming as teams.</p><p>I was talking to a developer about how his team works now, and he described their collaboration like this, with no irony at all: "Oh, we just pass each other PRs to review and send our agents at them." And I've been chewing on that sentence ever since, because it was offered as a description of <em>collaboration</em> and it describes something that isn't collaboration at all.</p><p>Two engineers each building in isolation, each generating a change neither fully holds, each pointing an agent at the other's output to review it. That's not two people solving a problem together. It's two factories running in parallel, shipping parts to each other across a wall. There's throughput. There's no shared understanding being built, no one teaching anyone anything, no junior watching a senior reason through a hard call and absorbing how the senior thinks. The thing that used to happen <em>for free</em> inside collaboration &#8212; the learning, the transfer of taste and judgment from one person to another &#8212; has been engineered out, because the friction it rode on is the same friction we just removed.</p><p>That friction was load-bearing. The annoying parts of working together &#8212; having to explain your thinking, having to reconcile your model with someone else's, having to slow down enough to be understood &#8212; were also the parts that spread knowledge through a team and turned a group of individuals into something that knew more collectively than any one of them did. We treated that friction as pure overhead and optimized it away, and I'm not sure we noticed that it was doing a second job the whole time.</p><p>As an engineering leader, this is the part I can't shrug off. I want my team to collaborate. I want people solving each other's problems, learning from each other, getting sharper because they're surrounded by people who reason differently than they do. "We send our agents at each other's PRs" is the opposite of that. It's efficient and it's lonely and it doesn't compound the way a team that actually learns together compounds.</p><h2>I genuinely don't know what we do about this</h2><p>This is the part of the post where the formula says I'm supposed to give you three practices and a tidy framework. I'm not going to, because I don't have them, and I'd rather be honest than tidy.</p><p>I have half-formed instincts. Maybe deploy-readiness has to become an explicit team artifact instead of one person's implicit knowledge. Maybe we need to deliberately re-introduce some of the friction we removed: pairing on the hard features even when one person <em>could</em> solo them, precisely because the solo path skips the part where the team learns. Maybe the unit of work shouldn't be "a feature one person owns end to end" but something that's harder to hold alone on purpose. I don't trust any of these enough to tell you to go do them.</p><p>If I had to name the category all three point at, it's this: we may need to treat <em>knowledge externalization</em> as a first-class, scheduled part of the work. Not something we hope happens as a happy side effect of coordination, but a deliverable in its own right, planned and resourced like any other. And we may have to accept slower individual throughput on certain classes of change (the load-bearing, cross-service, wakes-you-at-2 a.m. ones) in exchange for lower organizational bus factor and faster <em>future</em> changes. That's a real trade, not a free lunch. It only pays off if the second-order cost is real, which is the whole question I can't yet answer.</p><p>What I'm fairly sure of is the shape of the trap. Every individual incentive points at the solo end-to-end build: it's faster, it's satisfying, it makes you look productive. Every individual decision to work that way is locally rational. And the cost lands somewhere that no individual feels and no dashboard shows: on the team's collective understanding, paid back slowly, at the seams, in deploys that drag and knowledge that doesn't spread and a kind of working-together that's quietly stopped being together at all.</p><p>Here's where I have to check my own framing, though, because I build multi-agent systems for a living and it would be too easy to write "agents bad" and walk away. The honest version is narrower: <em>this generation</em> of agent usage is pointed at individual velocity, which is exactly the force pulling context into one head. There's no law that says the next generation has to be. You could build agents whose whole job is the opposite. Agents that force externalization and cross-model reconciliation instead of skipping it:</p><ul><li><p>agents that interrogate a build the way a teammate without the context would, asking "what happens if the migration runs while the old consumer is still up?", and making you answer before the work can ship;</p></li><li><p>handoff artifacts generated not as prose docs but as queryable models of the decision space, something the next engineer can actually ask questions of instead of reading cold;</p></li><li><p>multi-agent setups where separate agents role-play platform, security, and on-call during the build phase, so the reconciliation that used to happen between people happens before the work ever leaves one person.</p></li></ul><p>None of that exists in a mature form yet, and I'm not claiming I've built it. But it shifts how I read the problem. It isn't "AI killed collaboration," full stop. It's that the first thing we aimed these tools at was individual speed, and the layer that pays the cost back (the one that rebuilds the reconciliation step inside the machine) is a layer we mostly haven't built. That's a more interesting problem than a complaint.</p><p>So I'll ask the people actually living this, because I think the answer is out there in your teams and not in my head:</p><p>Is this even a problem where you work? Or am I mistaking a transition for a loss? I'm especially interested in teams that have been working this way for six to twelve months: are you seeing the second-order effects yet (the bus-factor holes, the learning that quietly stopped), or am I over-weighting the transition period and this all settles out? And if it is a problem, what are you actually doing about it? Have you found a way to keep the velocity without hollowing out how your team learns from each other? I want the real answers, including "you're wrong, here's why." I'm working this out, and I'd rather work it out with you than pretend I've already figured it out.</p><p>Tell me in the comments. I'm reading all of them.</p>]]></content:encoded></item><item><title><![CDATA[Modern Claude Code: The Complete `.claude/` Anatomy]]></title><description><![CDATA[You're using maybe a third of what `.claude/` can do. Here's the complete modern anatomy &#8212; organized by directory and scope precedence &#8212; and the&#8230;]]></description><link>https://geggleto.substack.com/p/modern-claude-code-the-complete-claude</link><guid isPermaLink="false">https://geggleto.substack.com/p/modern-claude-code-the-complete-claude</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Mon, 08 Jun 2026 17:57:24 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3afab1da-ac34-46ff-a7c6-125ee40a447a_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>A field guide to the directory, organized by where files live and which one wins.</em></p><div><hr></div><p>Open your <code>.claude/</code> directory right now. Go ahead &#8212; I'll wait.</p><p>Most of it is probably empty. Maybe a stray <code>CLAUDE.md</code> you wrote once and forgot. Maybe nothing at all, because you've been running Claude Code straight out of the box, re-explaining the same context every session, and quietly assuming that's just how it works.</p><p>It isn't. You're using maybe a third of what <code>.claude/</code> can do. There is a complete modern anatomy here: a handful of directories and config keys, each with a defined job and a defined loading order. The highest-leverage features are exactly the ones that don't show up unless you go looking. This post is the map of the whole surface &#8212; every directory, what lives in it, and the one rule that ties it all together: <strong>which file wins when two of them disagree.</strong></p><p>Not build order. Not philosophy. Inventory and precedence. By the end you'll be able to look at any <code>.claude/</code> tree and know what's there, what's missing, and why one rule overrides another.</p><div><hr></div><h2>The Two Trees: Project and Global</h2><p>There isn't one <code>.claude/</code>. There are two, and the distinction is the foundation for everything else.</p><p>The <strong>project tree</strong> lives at <code>&lt;repo-root&gt;/.claude/</code>. It's committed to git, shared with the team, scoped to this repository. Anything that should travel with the code &#8212; the conventions for <em>this</em> codebase, the skills <em>this</em> project needs &#8212; lives here.</p><p>The <strong>global tree</strong> lives at <code>~/.claude/</code> in your home directory. It's personal, machine-local, applies to every project you touch. Your own habits, your own slash commands, the rules you want everywhere regardless of which repo you opened &#8212; those live here.</p><p>Here's the project tree, annotated:</p><pre><code>&lt;repo-root&gt;/
&#9500;&#9472;&#9472; CLAUDE.md                  # project memory &#8212; conventions, ground truth
&#9500;&#9472;&#9472; .mcp.json                  # team-shared MCP servers (committed)
&#9492;&#9472;&#9472; .claude/
    &#9500;&#9472;&#9472; settings.json          # permissions, hooks, outputStyle (committed)
    &#9500;&#9472;&#9472; settings.local.json    # personal overrides (gitignored)
    &#9500;&#9472;&#9472; skills/
    &#9474;   &#9492;&#9472;&#9472; &lt;name&gt;/SKILL.md     # auto-invoked capabilities
    &#9500;&#9472;&#9472; agents/
    &#9474;   &#9492;&#9472;&#9472; &lt;name&gt;.md           # specialist subagents
    &#9500;&#9472;&#9472; commands/
    &#9474;   &#9492;&#9472;&#9472; &lt;name&gt;.md           # custom slash commands
    &#9492;&#9472;&#9472; rules/
        &#9492;&#9472;&#9472; &lt;name&gt;.md           # path-scoped behavior rules</code></pre><p>And the global tree, which mirrors it:</p><pre><code>~/
&#9500;&#9472;&#9472; .claude.json               # local + user MCP scopes, app state
&#9492;&#9472;&#9472; .claude/
    &#9500;&#9472;&#9472; CLAUDE.md              # user memory &#8212; applies to every project
    &#9500;&#9472;&#9472; settings.json         # global permissions, hooks, defaults
    &#9500;&#9472;&#9472; skills/               # your personal skills, everywhere
    &#9500;&#9472;&#9472; agents/               # your personal specialists
    &#9492;&#9472;&#9472; commands/             # your personal slash commands</code></pre><p>The shapes are almost identical on purpose. Nearly everything that can exist at the project level can also exist at the global level. Which raises the obvious question: if you have a skill named <code>code-review</code> in both trees, or a permission rule in both <code>settings.json</code> files, which one runs?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GJz6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c6254ed-cfeb-48f6-90fa-eda3803821b5_1600x1040.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GJz6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c6254ed-cfeb-48f6-90fa-eda3803821b5_1600x1040.png 424w, https://substackcdn.com/image/fetch/$s_!GJz6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c6254ed-cfeb-48f6-90fa-eda3803821b5_1600x1040.png 848w, https://substackcdn.com/image/fetch/$s_!GJz6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c6254ed-cfeb-48f6-90fa-eda3803821b5_1600x1040.png 1272w, https://substackcdn.com/image/fetch/$s_!GJz6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c6254ed-cfeb-48f6-90fa-eda3803821b5_1600x1040.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GJz6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c6254ed-cfeb-48f6-90fa-eda3803821b5_1600x1040.png" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6c6254ed-cfeb-48f6-90fa-eda3803821b5_1600x1040.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GJz6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c6254ed-cfeb-48f6-90fa-eda3803821b5_1600x1040.png 424w, https://substackcdn.com/image/fetch/$s_!GJz6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c6254ed-cfeb-48f6-90fa-eda3803821b5_1600x1040.png 848w, https://substackcdn.com/image/fetch/$s_!GJz6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c6254ed-cfeb-48f6-90fa-eda3803821b5_1600x1040.png 1272w, https://substackcdn.com/image/fetch/$s_!GJz6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c6254ed-cfeb-48f6-90fa-eda3803821b5_1600x1040.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Scope Precedence: The Rule That Ties It Together</h2><p>The mental model is a stack, from broadest to most specific:</p><ul><li><p><strong>Enterprise / managed policy</strong> (set by an admin, if present) &#8212; the outermost layer.</p></li><li><p><strong>User / global</strong> (<code>~/.claude/</code>) &#8212; your personal defaults, every project.</p></li><li><p><strong>Project</strong> (<code>&lt;repo-root&gt;/.claude/</code>) &#8212; shared, committed, this repo.</p></li><li><p><strong>Local project override</strong> (<code>settings.local.json</code>) &#8212; your personal tweaks for this repo, gitignored.</p></li></ul><p>The principle is <strong>most-specific scope wins.</strong> A project setting overrides a global one. A local override beats the committed project setting. Your personal <code>~/.claude/CLAUDE.md</code> sets the baseline; the repo's <code>CLAUDE.md</code> layers on top of it for that repo.</p><p>This is not a clever feature you opt into. It's the load order that's running right now, whether you've configured it or not. The reason most people's setup feels unpredictable is that they have rules in one tree, expectations from the other, and no model of which one the agent actually reads.</p><p>Once you internalize the stack, configuration stops being guesswork. Want a habit on every machine task you do? Global. Want a convention that travels with the codebase to every teammate? Project, committed. Want to bend the project's rules just for yourself without touching the shared file? <code>settings.local.json</code>. The directory you put the file in <em>is</em> the scope decision.</p><p>The seven-component conceptual model &#8212; and the build-in-order playbook for assembling all of this from scratch &#8212; is its own thing; I cover that in the free <a href="agentcos-map.md">AgenticOS Map</a>. This post is the physical anatomy. Same surface, different cut.</p><div><hr></div><h2>`settings.json`: Permissions, Hooks, and the Keys You Haven't Touched</h2><p><code>settings.json</code> is the control panel. Three parts of it matter most, and almost nobody configures the second two.</p><p><strong>Permissions.</strong> You can pre-authorize or block tool calls with allow and deny lists, so Claude stops prompting you for the same <code>npm test</code> every session &#8212; and <em>can't</em> run the commands you never want it to. An <code>allow</code> entry for the commands you trust removes a class of interruptions; a <code>deny</code> entry is a guardrail that survives across sessions.</p><p><strong>Hooks.</strong> Hooks wire automated behavior to session events: <code>PreToolUse</code>, <code>PostToolUse</code>, <code>Stop</code>. The canonical one is a <code>PostToolUse</code> hook on <code>Write</code> that runs your linter the moment a file changes. The point is that <em>the harness</em> executes these, not the model. If you've ever asked Claude to "always run the formatter after editing" and watched it forget three turns later, that's because you put it in prose instead of in a hook. Prose is a suggestion; a hook is a guarantee. (The hooks <em>philosophy</em> &#8212; when automation earns its place &#8212; is its own deep dive in the paid series.)</p><p><code>outputStyle</code> <strong>.</strong> This one corrects a common misconception. There is no <code>.claude/output-styles/</code> directory. Output style is a single setting in <code>settings.json</code>:</p><pre><code>{
  "outputStyle": "Explanatory"
}</code></pre><p>It changes how Claude communicates &#8212; for example, an <code>Explanatory</code> style that narrates its reasoning as it works. You can also flip it interactively with <code>/config</code>. If you went looking for a directory, that's why you didn't find one. It's a key, not a folder.</p><div><hr></div><h2>`CLAUDE.md`: The Memory File, As Mechanics</h2><p><code>CLAUDE.md</code> is the highest-priority instruction file in the system &#8212; the project's ground truth, read at session start. There's a build-order argument about <em>when</em> to write it (short version: last, once the rest of the system exists; long version is in the paid series). Set that aside. Here are the mechanics.</p><p><strong>The under-200-lines discipline.</strong> <code>CLAUDE.md</code> competes for the same context window as your actual code. A bloated constitution crowds out the thing you're trying to work on, and past a certain length the agent starts skimming it the way you skim a terms-of-service page. Keep it tight. If a rule is derivable from the repo, it doesn't belong here.</p><p><code>/init</code> <strong>.</strong> Run it once in a new repo and Claude bootstraps a starter <code>CLAUDE.md</code> by reading the codebase &#8212; package manager, test command, structure. It's the fastest way from empty to useful.</p><p><code>/memory</code> <strong>.</strong> Opens the memory files for direct editing so you can curate them deliberately instead of letting them accrete.</p><p><strong>Path-scoped rules.</strong> This is the feature most people don't know exists, so it gets its own section.</p><div><hr></div><h2>`rules/`: Behavior That Loads Only When It's Relevant</h2><p><code>.claude/rules/</code> is real, current, and badly underused. It solves the <code>CLAUDE.md</code> bloat problem directly.</p><p>Every rule you cram into <code>CLAUDE.md</code> loads on every session, whether or not it's relevant. A rule about your API error-handling convention is dead weight when you're editing CSS. Path-scoped rules fix that. A rule file is a markdown file under <code>.claude/rules/</code> with optional YAML frontmatter, and the key field is <code>paths:</code>:</p><pre><code>---
paths: ["src/api/**/*.{ts,tsx}"]
---

# API conventions

- Every endpoint returns the standard `{ data, error }` envelope.
- Errors use the shared `AppError` class, never raw throws.
- Validate input at the boundary with the zod schemas in `src/api/schemas/`.</code></pre><p>The <code>paths:</code> field accepts glob patterns, including brace expansion like <code>{ts,tsx}</code>. The behavior is the leverage: <strong>a rule with</strong> <code>paths:</code> <strong>loads only when Claude reads a file matching one of those globs.</strong> Edit something under <code>src/api/</code>, the API conventions load. Edit a stylesheet, they stay out of context entirely.</p><p>A rule file <em>without</em> a <code>paths:</code> field loads unconditionally &#8212; use that for genuinely global conventions. But the path-scoped variant is how you keep a large, opinionated codebase governed without paying the context cost on every unrelated edit. It's <code>CLAUDE.md</code> discipline, automated by relevance.</p><p>(Reference: the path-specific rules documentation at code.claude.com.)</p><div><hr></div><h2>Commands and Skills: Two Ways to Reach Behavior</h2><p>Custom slash commands and skills both live as markdown files, and they overlap enough to confuse people. The distinction is <em>how they fire.</em></p><p>A <strong>custom command</strong> lives at <code>.claude/commands/&lt;name&gt;.md</code> and you invoke it explicitly: <code>/my-command</code>. It's a saved prompt you trigger on demand.</p><p>A <strong>skill</strong> lives at <code>.claude/skills/&lt;name&gt;/SKILL.md</code> and is <em>auto-invoked</em> when the conversation context matches its <code>description</code>. You don't have to remember it exists; Claude reaches for it when the work calls for it. That auto-invocation is the whole point &#8212; a skill is behavior the system applies for you, a command is behavior you summon.</p><p>Both support dynamic content, and these two pieces of syntax are where a lot of power lives:</p><p><code>$ARGUMENTS</code> <strong>for user input.</strong> All-caps. You can take everything the user passed (<code>$ARGUMENTS</code>), a positional slice (<code>$1</code>, <code>$ARGUMENTS[2]</code>), or a named field (<code>$name</code>). This is what turns a static prompt into a parameterized one &#8212; <code>/review src/auth</code> flows <code>src/auth</code> straight into the command body.</p><p><strong>`</strong><code>** </code> !<code> **shell command** </code> <code> **</code> for live shell output.** Backtick-wrapped, bang-prefixed. It runs the shell command as <em>preprocessing</em> &#8212; before Claude ever sees the prompt &#8212; and substitutes the output inline. So a command can open with `<code> !</code>git diff --staged<code> </code>` and Claude starts the turn already looking at your real, current diff. No copy-paste, no stale context.</p><p>Skill frontmatter is richer than most people realize. The fields you'll actually use:</p><ul><li><p><code>name</code> and <code>description</code> &#8212; the latter drives auto-invocation, so write it for matching, not for marketing.</p></li><li><p><code>allowed-tools</code> &#8212; <strong>hyphenated, not camelCase.</strong> This trips people up constantly. Restrict a skill to exactly the tools it needs.</p></li><li><p><code>disallowed-tools</code> &#8212; the inverse, when a denylist is cleaner than an allowlist.</p></li><li><p><code>disable-model-invocation</code> &#8212; make a skill manual-only (no auto-invoke).</p></li><li><p><code>user-invocable</code> &#8212; control whether a user can trigger it directly.</p></li><li><p><code>paths</code> &#8212; scope a skill to relevant files, same idea as rules.</p></li><li><p><code>arguments</code> &#8212; declare the inputs the skill expects.</p></li><li><p><code>context: fork</code> &#8212; run the skill in a forked context so it doesn't pollute the main conversation.</p></li><li><p><code>agent</code>, <code>model</code>, <code>effort</code> &#8212; route the skill to a specific subagent, model, or effort level.</p></li><li><p><code>hooks</code> &#8212; wire skill-local event behavior.</p></li></ul><p>You don't need all of these on day one. You do need to know they exist, because the difference between a skill that works and one that quietly grabs the wrong tool is usually one frontmatter line.</p><div><hr></div><h2>`agents/`: Specialists, Not Generalists</h2><p><code>.claude/agents/</code> holds named subagents with declared scope and a declared tool allowlist. A <code>code-reviewer</code> that reviews but never writes. A <code>security-reviewer</code> that flags but never fixes. You dispatch them from an orchestrator, they do their narrow job, they report back.</p><p>The mechanic worth knowing here is the restriction: a subagent can be locked to a specific model and a specific set of tools, which is what makes fan-out safe. You can launch several at once over isolated worktrees without them stepping on each other or reaching for tools they shouldn't have. The <em>why</em> &#8212; separation of concerns at the agent level, the briefing discipline, the review gate &#8212; is the subject of the paid series' agents post. The <em>where</em> is <code>.claude/agents/</code>, and you manage the whole roster with <code>/agents</code>.</p><div><hr></div><h2>MCP: Wiring in External Tools</h2><p>MCP (Model Context Protocol) is how Claude Code reaches systems outside the repo &#8212; a database, an issue tracker, a docs server. Where you put the config decides who gets it, and that maps cleanly onto the scope stack:</p><ul><li><p><code>.mcp.json</code> at the project root is <strong>team-shared.</strong> Commit it, and every teammate who clones the repo gets the same servers wired up. This is the one you want for project infrastructure.</p></li><li><p><code>~/.claude.json</code> holds your <strong>local</strong> scope (just the current project, just you) and your <strong>user</strong> scope (every project, just you).</p></li></ul><p>You don't hand-edit these in practice. You run:</p><pre><code>claude mcp add --scope project &lt;name&gt; &lt;command...&gt;
claude mcp add --scope user &lt;name&gt; &lt;command...&gt;
claude mcp add --scope local &lt;name&gt; &lt;command...&gt;</code></pre><p>The <code>--scope</code> flag is the same precedence decision in command form. <code>project</code> writes to the committed <code>.mcp.json</code>; <code>user</code> and <code>local</code> write to <code>~/.claude.json</code>. Pick the scope, and you've decided who inherits the server.</p><div><hr></div><h2>The Features You're Probably Not Using</h2><p>Here's the part of the inventory most people never reach &#8212; the built-in slash commands that ship with Claude Code and never announce themselves. These aren't files you write. They're already there.</p><p><strong>Context and session management:</strong></p><ul><li><p><code>/compact</code> &#8212; compress the conversation history when it gets long, keeping the thread alive without the bloat.</p></li><li><p><code>/clear</code> &#8212; wipe the context and start fresh.</p></li><li><p><code>/context</code> &#8212; show what's currently loaded into the context window. The single best way to see <em>why</em> the agent is behaving the way it is.</p></li><li><p><code>/effort</code> &#8212; tune how much reasoning effort the model spends.</p></li></ul><p><strong>Working and recovering:</strong></p><ul><li><p><code>/plan</code> &#8212; plan mode: the agent thinks through the approach before touching anything.</p></li><li><p><code>/rewind</code> &#8212; roll back to an earlier checkpoint. Claude Code keeps checkpoints; <code>/rewind</code> is the undo you didn't know you had when a session goes sideways.</p></li><li><p><code>/loop</code> &#8212; run a prompt or command on a repeating interval.</p></li><li><p><code>/copy</code> &#8212; copy output to the clipboard.</p></li></ul><p><strong>The high-leverage ones worth featuring:</strong></p><ul><li><p><code>/agents</code> &#8212; manage your subagents (the roster from the <code>agents/</code> section).</p></li><li><p><code>/code-review</code> &#8212; multi-axis review, with <code>--fix</code> to apply findings and <code>--comment</code> to post them inline.</p></li><li><p><code>/batch</code> &#8212; make parallel changes across worktrees.</p></li><li><p><code>/diff</code> &#8212; inspect changes directly.</p></li><li><p><code>/goal</code> &#8212; set and track the session's objective.</p></li></ul><p>If you take one action from this whole post, run <code>/context</code> in your next session and look at what's actually loaded. Then run <code>/agents</code> and <code>/code-review</code> once each. Most people discover in five minutes that the tool they've been using is a fraction of the tool that shipped.</p><div><hr></div><h2>The Community Surface</h2><p>The official docs at code.claude.com are the ground truth for every claim in this post &#8212; when something here disagrees with a folk pattern you read somewhere, trust the docs. Beyond them, the community has built a substantial catalog of skills, commands, and agent definitions worth borrowing from: <a href="https://github.com/hesreallyhim/awesome-claude-code">awesome-claude-code</a>. Read a few well-built skill files before you write your own; the frontmatter patterns alone will save you an afternoon.</p><div><hr></div><h2>What You Now Know</h2><p>You came in using maybe a third of <code>.claude/</code>. You now have the complete modern anatomy: two trees, project and global; the precedence stack that decides which file wins; <code>settings.json</code> for permissions, hooks, and the <code>outputStyle</code> key; <code>CLAUDE.md</code> mechanics and the under-200-line discipline; path-scoped <code>rules/</code> that load only when relevant; commands versus auto-invoked skills with <code>$ARGUMENTS</code> and live shell substitution; specialist subagents in <code>agents/</code>; MCP wiring by scope; and the built-in slash commands hiding in plain sight.</p><p>The gap was never the tool. It was the inventory. You can't configure a directory you've never mapped.</p><p>Two paths from here, depending on what you want next:</p><ul><li><p>If you want the <strong>mental model for building all of this in order</strong> &#8212; what to write first, what to write last, and why &#8212; start with the free <a href="agentcos-map.md">AgenticOS Map</a>. It's the build-in-order companion to this anatomy.</p></li><li><p>If you're an engineer <strong>ready to build now</strong>, the paid AgenticOS series walks the construction end to end, starting with <strong>P1: Skills &#8212; the atomic unit of agent behavior.</strong> That's where the anatomy becomes a system.</p></li></ul><p>Both paths start the same way: subscribe, and the free AgenticOS Map lands in your inbox next.</p><p><a href="https://geggleto.substack.com">Subscribe to AgenticOS on Substack</a></p><div><hr></div><p>&lt;details&gt;</p><p>&lt;summary&gt;SEO meta&lt;/summary&gt;</p><ul><li><p><strong>Title (&#8804;60 chars):</strong> Modern Claude Code: The Complete .claude/ Anatomy</p></li><li><p><strong>Meta description (&#8804;155 chars):</strong> The complete modern .claude/ directory anatomy &#8212; every folder, scope precedence, and the Claude Code power-user features most people never configure.</p></li><li><p><strong>URL slug:</strong> modern-claude-code-anatomy</p></li><li><p><strong>Primary keyword:</strong> .claude directory</p></li><li><p><strong>Secondary keywords:</strong> claude code features, claude code power user, claude code rules, claude code settings.json</p></li><li><p><strong>Search intent:</strong> informational</p></li></ul><p>&lt;/details&gt;</p>]]></content:encoded></item><item><title><![CDATA[I've Been Gatekeeping the Magic. Here's Everything.]]></title><description><![CDATA[Senior engineers have been quietly building AI scaffolding that juniors never get access to. I just open-sourced mine.]]></description><link>https://geggleto.substack.com/p/ive-been-gatekeeping-the-magic-heres</link><guid isPermaLink="false">https://geggleto.substack.com/p/ive-been-gatekeeping-the-magic-heres</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Fri, 05 Jun 2026 14:05:48 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/2334435b-e69d-4c86-90a6-df46a283b610_1600x836.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Senior engineers have been quietly building AI scaffolding that juniors never get access to. I just open-sourced mine.</em></p><div><hr></div><p>For months I told myself I'd publish this when it was cleaner. More polished. Better documented.</p><p>That was not the truth. That was gatekeeping, and today it ends.</p><p>I've been running an AI operating system for months now. It made me faster across every repeatable work type I tracked. Everyone who got access to it shipped at a senior level and super quick. I kept refining it, kept not publishing it, kept reaping the leverage while the rest of the field used raw API access and hoped for the best.</p><p>The OS is now open-source: <strong><a href="https://github.com/LazyIsEfficient/agentic-os">https://github.com/LazyIsEfficient/agentic-os</a></strong></p><p>One curl command. It installs immediately. I'm done sitting on it.</p><div><hr></div><h2>The gap is not what you think</h2><p>The story we tell ourselves about AI tooling is that access is equal. Get an API key, set up Copilot or whatever tool you want to pay for, now everyone is in the game. Junior and senior on the same playing field.</p><p>That story is wrong.</p><p>Senior engineers are not "better at prompting." They have spent months accumulating scaffolding: reusable skill files that tell agents exactly how to handle a class of work, specialist agents scoped to one task with the right tools, intake patterns called shapers that turn vague requests into scoped briefs before a word of code is written, memory systems that survive session boundaries and eliminate re-explaining context every morning, a constitution file (CLAUDE.md) that sets the rules everything else runs inside.</p><p>That scaffolding took months to build. Most seniors have not published it. Most have not even articulated it. It lives in their workflow as a private compound interest machine.</p><p>Juniors and interns get the raw API. They get Copilot/Cursor autocomplete. They get "figure it out." The gap between what a senior produces with their AI OS and what a junior produces with a chat window is not a skill gap. It is a scaffolding gap.</p><blockquote><p>Scaffolding is transferable. That is the part that changes everything.</p></blockquote><div><hr></div><h2>What actually happens when a junior runs the OS</h2><p>I saw this firsthand. Junior engineers who got access to the same skills, shapers, and agents I was running stopped producing uncertain drafts and started producing outputs I reviewed and shipped. Interns who previously needed heavy guidance ran the OS against real tasks and came back with architecturally sound code.</p><p>The pushback I always get: "Won't this just produce vibe-coded slop at scale?"</p><p>No. The OS is specifically why not.</p><p>The OS ships with a code-reviewer agent that runs on every non-trivial diff. A security-reviewer for anything touching auth or user data. A TDD skill that drives implementation from tests first. A quality gate built into every content pipeline. These are not guardrails bolted on after the fact. They are first-class components of the system.</p><p>The OS does not produce raw output and hope for the best. It produces output and reviews it. A junior running the OS produces more than a senior without one, and the output passes through review before it ships. That is not slop. That is a supervised pipeline that anyone can run.</p><div><hr></div><h2>What's in the OS</h2><p>The library ships with 80+ skills and 18 specialist agents. The categories:</p><p><strong>Engineering</strong>: TDD, code review, security review, API design, debugging, frontend, TypeScript, Rust, cloud infrastructure, CI/CD, SRE, release management</p><p><strong>Content</strong>: blog post shaping and authoring, course design, social growth, SEO ops, podcast ops</p><p><strong>Product</strong>: technical product management, system architecture, documentation, ADRs</p><p><strong>Games</strong>: Godot, Phaser, game design, balancing, monetization</p><p>Install on macOS/Linux:</p><pre><code>curl -fsSL https://raw.githubusercontent.com/LazyIsEfficient/agentic-os/main/install.sh | bash</code></pre><p>Files go to <code>~/.claude/skills/</code> and <code>~/.claude/agents/</code>. Available immediately in any Claude Code session.</p><p>Then add this to your <code>~/.claude/CLAUDE.md</code> to make Claude reach for skills by default instead of treating them as opt-in:</p><pre><code>## Skills
You have a library of skills installed at `~/.claude/skills/`. Before responding to any task,
check whether a skill applies and invoke it with the Skill tool if so.
If there is even a 1% chance a skill might apply, invoke it first.</code></pre><p>That single block is the highest-leverage configuration step in the entire setup.</p><div><hr></div><h2>Why I kept sitting on it</h2><p>I told myself the OS needed to be complete before I shared it. Every week I added something, fixed something, and moved the goalposts on what "ready" meant.</p><p>The real reason: leverage feels better when it's yours alone.</p><p>That was the wrong call. The engineers who wait for polished tooling before they start are the ones falling behind. The ones who pull this now, run it rough, and calibrate through use are the ones compounding their leverage every week.</p><p>Juniors and interns do not need to wait for their senior to build the OS for them. They can pull it now, today, and start running skills that took months to refine. Leads who want their team to ship faster without more senior bandwidth can hand this to their juniors right now.</p><p>The gatekeeping was never about protecting the tool. It was hesitation. And hesitation is just compounding disadvantage for everyone who is not you.</p><div><hr></div><h2>What comes next</h2><p>This is the launch. The build playbook comes after.</p><p>I have been writing a Substack series on how to build this OS from scratch: why to start with skills and not CLAUDE.md, how to write shapers that actually reduce intake noise, how memory compounds over weeks, and how to wire hooks that catch problems before CI does.</p><p>The map post is free and published. The build playbook is paid.</p><p><a href="https://geggleto.substack.com/p/build-your-own-agenticos-the-complete">Read the complete AgenticOS map on Substack</a></p><p>If you want the full series, subscribe. The first paid post walks through writing your first skill file, the single step that changes how your team uses AI. That is where the build-out starts.f</p>]]></content:encoded></item><item><title><![CDATA[Build Your AgenticOS: Watch It Run]]></title><description><![CDATA[Seven posts explaining the system. This one shows it &#8212; a real engineering task, start to finish.]]></description><link>https://geggleto.substack.com/p/build-your-agenticos-watch-it-run</link><guid isPermaLink="false">https://geggleto.substack.com/p/build-your-agenticos-watch-it-run</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Mon, 01 Jun 2026 21:20:23 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/9546d5ce-c61b-4ad2-908f-7c410d1084dc_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Seven posts explaining the system. This one shows it.</em></p><div><hr></div><p>Seven posts describing how a system works is not the same as watching it work. You can read every post in this series, follow every step, and still have an open question: "But what does it actually look like when it runs?" That question deserves a direct answer.</p><p>This post is that answer. Not more explanation. A real session, recorded, unedited.</p><p>The claim behind this entire series is that a well-built AgenticOS doesn't make AI magic. It makes AI predictable. The session you're about to watch isn't impressive because of what the AI does. It's impressive because of the system around it. The same session, on the same task, produces the same shape of output every time. That's the point.</p><div><hr></div><h2>What You're Watching</h2><p></p><p>This is a real task from the actual codebase. Not a demo task built to look clean. Not a simplified example. The codebase is this system's own library of skills and agents. The task is something that needed doing.</p><p>Here is what you'll watch happen, in order:</p><p><strong>Session start: the context loads.</strong> When Claude Code opens, ~/<code>CLAUDE.md</code> loads automatically. That file is the constitution. It tells the agent where memory lives, how to dispatch work, and what the rules are. Before a single instruction is typed, the agent knows the system it's operating inside. The memory index loads next. That's where prior session context lives. The agent reads it and starts hot instead of cold.</p><p><strong>The task goes through the prompt-shaper.</strong> Rather than typing a vague instruction and hoping the agent figures out what's wanted, the task runs through <code>prompt-shaper</code> first. The shaper asks a focused set of questions. It turns a rough idea into a scoped brief: what the output is, what files it touches, what done looks like. This takes a few minutes. It prevents thirty minutes of correction later.</p><p><strong>Specialist agents are dispatched in parallel.</strong> Once there's a brief, agents execute against it. Not one agent doing everything sequentially. Multiple agents running simultaneously in a single message, each with a narrow scope. You'll see this in the terminal output: multiple task outputs arriving in a short window.</p><p><strong>The review gate runs.</strong> After the implementation agents finish, <code>code-reviewer</code> and <code>library-reviewer</code> are dispatched in parallel. They're read-only. Their job is to catch what the implementation agents can't see. The reviews come back with a verdict: ship, ship-with-fixes, or hold.</p><p><strong>Reading the diff.</strong> Before merging, the diff is read directly. Not trusting the agent's summary. Checking the actual changes: what changed, in which files, does it match the brief.</p><p><strong>Done: the merged result.</strong> The task is merged. The session ends. The memory layer gets a new entry if anything non-obvious was learned.</p><p>That's the full loop. Now watch it.</p><div><hr></div><h2>The Session</h2><p>This session took real time 22 minutes; its fast-forwarded!</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;8ad7e08d-ab06-4694-b346-01ae1d0cac2a&quot;,&quot;duration&quot;:null}"></div><div><hr></div><h2>What to Notice</h2><p>Readers of the series will recognize the patterns as they appear. A few things worth watching for specifically:</p><p><strong>The shaper isn&#8217;t run because the PRD was excellent. </strong>This saves the agent time, and you your sanity. Ask in chat if you want the PRD.md</p><p><strong>The review gate runs in parallel.</strong> Two agents, one message, both results arrive before a merge decision is made. The gate is not a formality. It's structurally separate from the implementation agents, which means it has no stake in defending what they wrote. An independent second pass by construction.</p><p><strong>The memory loaded at the start.</strong> One of the memory entries that loads is the one that explains the <code>&#127798;&#65039; Take</code> prefix for social posts. That's why the agent doesn't ask about it. The system knows. That's what memory is for: non-obvious facts that would otherwise have to be re-explained in every session.</p><p><strong>The agents don't improvise scope.</strong> At no point does an agent decide the task needs something extra. The brief is the contract. The agents execute the brief. Scope decisions happen at intake, not during implementation.</p><p><strong>The diff is checked, not assumed.</strong> The agent's final message is a description of what it intended to do. The diff is what it actually did. Those two things are checked against each other before merging. This is the habit that prevents a whole category of invisible errors. A subagent that says "I updated the routing in CLAUDE.md" and a diff that shows it also touched three other files is a signal, not a rubber stamp. Read the diff.</p><p><strong>The session is reproducible.</strong> There is no moment in the recording where the output depends on a lucky prompt or a particularly cooperative AI response. The structure of the session is the same every time this class of task runs. Shaper runs first. Agents dispatch from the brief. Gate runs after implementation. Diff before merge. Any engineer on any team can follow the same structure and get the same shape of result.</p><div><hr></div><h2>The System Is Not Magic. It's Consistent.</h2><p>The session you just watched isn't impressive because of the AI. The underlying model is the same one you have access to. What's different is the system it's operating inside.</p><p><code>CLAUDE.md</code> loads on session start. Memory starts the agent warm. The shaper turns vague requests into tight briefs. Specialist agents execute with narrow scope. The review gate runs independently. The diff is read before merging.</p><p>None of these steps are clever. Each one is just a habit encoded into a file. The aggregate of those habits is a system that produces predictable output from the same starting materials every time. That's not magic. That's engineering.</p><p>If you've read this series and want to build your own version, start with the map. Every component is explained there, with the build order, the reason each layer exists, and what it gives you. You don't have to build all seven layers at once. The map tells you which layer to start with and what you get from it.</p><p><a href="{{MAP_POST_URL}}">The Complete Map: Build Your Own AgenticOS</a></p><h2>The proof</h2><p><a href="https://github.com/geggleto/test-agentic-os">See the code output here</a></p>]]></content:encoded></item><item><title><![CDATA[Build Your AgenticOS: Hooks Automate the Invisible]]></title><description><![CDATA[Hooks wire your AgenticOS to session events. They're the automation layer that removes the work you'd otherwise have to remember to ask for.]]></description><link>https://geggleto.substack.com/p/build-your-agenticos-hooks-automate</link><guid isPermaLink="false">https://geggleto.substack.com/p/build-your-agenticos-hooks-automate</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Mon, 01 Jun 2026 16:59:14 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/08213780-0814-4502-9339-6dde0c8ad5df_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Hooks wire your AgenticOS to session events. They're the automation layer that removes the work you'd otherwise have to remember to ask for.</em></p><div><hr></div><p>Memory tells agents what to remember. Hooks make the system act without being asked.</p><p>Every session, there is overhead you pay without thinking about it: describing the current git state, establishing which project is active, noting which deployment is live. None of it is hard. All of it is unnecessary. You know the system should have that context automatically. You keep meaning to set it up. Instead, you type it again.</p><p>Hooks are the automation layer that fixes this. They are shell commands wired to session events (SessionStart, PreToolUse, PostToolUse, Stop) and their output flows directly into the agent's context. No prompt required. The agent sees the hook output the same way it sees anything else you write. It just happened without you.</p><p>This post covers what hooks are, where they live, the event model, Glenn's real examples from <code>.claude/settings.json</code>, how to design a hook that works without surprising you, and the clean distinction between hooks, skills, and CLAUDE.md.</p>
      <p>
          <a href="https://geggleto.substack.com/p/build-your-agenticos-hooks-automate">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Build Your AgenticOS: Worktrees for Parallel Agents]]></title><description><![CDATA[Parallel agents stomping files is an isolation problem, not a parallelism problem. Here's the worktree pattern that fixes it.]]></description><link>https://geggleto.substack.com/p/build-your-agenticos-worktrees-for</link><guid isPermaLink="false">https://geggleto.substack.com/p/build-your-agenticos-worktrees-for</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Mon, 01 Jun 2026 16:58:36 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e85ca2a8-80eb-4012-81e3-c3fcf371d993_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Git worktrees are how you get the wall-clock benefits of parallelism without the merge conflicts. Use them whenever two agents may touch the same files.</em></p><div><hr></div><p>You run four agents in parallel. All four try to edit the same source file. The last one to write wins. The other three changes are gone. You don't get an error. You don't get a merge conflict. You get silent data loss dressed up as a successful run.</p><p>That is not a parallelism problem. It is an isolation problem. Parallelism is the right call; sequential dispatch of independent work is a bug. The mistake is dispatching parallel agents into a shared working tree and expecting them not to stomp on each other.</p><p>Git worktrees fix this. Each agent gets its own branch, its own working directory, zero shared file state. When the agents are done, you read the diffs and merge. The wall-clock cost of parallelism, without the silent data loss.</p><p>Here is what worktrees are, the exact rule for when to use them, the cost you're accepting, the wave pattern that keeps dispatch manageable, the verification discipline that catches agent failures, and the anti-pattern that makes the whole thing pointless.</p>
      <p>
          <a href="https://geggleto.substack.com/p/build-your-agenticos-worktrees-for">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Build Your AgenticOS: The CLAUDE.md Constitution]]></title><description><![CDATA[CLAUDE.md is where your AgenticOS rules live. It governs every session, every agent, every project.]]></description><link>https://geggleto.substack.com/p/build-your-agenticos-the-claudemd</link><guid isPermaLink="false">https://geggleto.substack.com/p/build-your-agenticos-the-claudemd</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Mon, 01 Jun 2026 15:51:44 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ca1ba193-f1fa-45a5-81f3-65d13fe0a742_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>CLAUDE.md is where your AgenticOS rules live. It governs every session, every agent, every project.</em></p><div><hr></div><p>Every layer in the AgenticOS stack is optional except one. You can skip memory if you're comfortable re-establishing context each session. You can skip shapers if you want to write briefs by hand. You can run without a structured agent library and just prompt directly. None of that is fatal.</p><p>Skip CLAUDE.md and you have no system at all.</p><p>Without it, every session you start from scratch. You re-explain the same anti-patterns. You remind the agent not to skip code review. You re-establish that parallel fan-out is required for independent tasks. You correct the same mistake twice, then three times, because nothing was written down and the agent's behavior resets on the next cold start.</p><p>CLAUDE.md is the instruction layer that governs every session. It is where you encode the rules you are tired of repeating.</p><p>Here is what it is, where it sits in the priority stack, what belongs inside it versus what doesn't, and how to write a rule that actually holds.</p>
      <p>
          <a href="https://geggleto.substack.com/p/build-your-agenticos-the-claudemd">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Build Your AgenticOS: Memory That Survives Sessions]]></title><description><![CDATA[The layer that makes your AgenticOS learn, one non-obvious fact at a time.]]></description><link>https://geggleto.substack.com/p/build-your-agenticos-memory-that</link><guid isPermaLink="false">https://geggleto.substack.com/p/build-your-agenticos-memory-that</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Mon, 01 Jun 2026 15:51:01 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/04a1f19a-f683-47ac-a125-d509bf64927c_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>The layer that makes your AgenticOS learn, one non-obvious fact at a time.</em></p><div><hr></div><p>Every session that starts cold is a session that re-learns what you already know. The agent you corrected yesterday will make the same mistake tomorrow. The decision you made two weeks ago will get relitigated next Tuesday. The quirk of your codebase that took forty-five minutes to explain will take another forty-five minutes. Again.</p><p>Memory is the layer that fixes this. Not memory in the fuzzy "AI remembers things" sense, but a concrete, version-controlled directory of short markdown files that captures the non-obvious facts your agent should start with on every session. Write it correctly and your AgenticOS accumulates context over time. Skip it and every session starts with a blank slate, which means every session is subtly slower than it should be.</p><p>Here is the format, the four types, the mechanics, and, critically, what not to save, because the most common mistake with memory is filling it with things the codebase already contains.</p>
      <p>
          <a href="https://geggleto.substack.com/p/build-your-agenticos-memory-that">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Build Your AgenticOS: Specialist Agents]]></title><description><![CDATA[The constraint is the feature. One agent definition file is more reliable than any prompt you've ever written.]]></description><link>https://geggleto.substack.com/p/build-your-agenticos-specialist-agents</link><guid isPermaLink="false">https://geggleto.substack.com/p/build-your-agenticos-specialist-agents</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Mon, 01 Jun 2026 15:50:05 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/a5abb291-0301-438c-8238-2f35811cdc05_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>The constraint is the feature. One agent definition file is more reliable than any prompt you've ever written.</em></p><div><hr></div><p>One generalist agent doing everything is how you get mediocre output at every layer.</p><p>Not because the model is weak. Because a single agent context trying to hold intake shaping, implementation, code review, security audit, and social copywriting simultaneously is optimizing for nothing in particular. You get plausible output on every front and excellent output on none. The quality ceiling on a generalist agent is determined by the widest surface it has to cover.</p><p>Specialist agents with declared tool allowlists and routing triggers beat a single do-everything agent every time. The constraint is the feature. A <code>code-reviewer</code> that cannot write code will not quietly sneak a "helpful" fix into the diff while reviewing it. A <code>security-reviewer</code> that can only read files will not accidentally delete one. A <code>blog-post-shaper</code> that has no access to <code>Write</code> cannot produce a draft before the brief is agreed. The narrower the scope, the more predictable the output.</p><p>This post covers what an agent definition file actually is, the mandatory build+review gate that sits downstream of every agent dispatch, fan-out vs sequential dispatch, briefing discipline, and a real example from <code>.claude/agents/</code>. At the end, a starter template for writing your own specialist Claude Code agents. The briefing section alone is worth the read: bad briefs are the single biggest source of wasted cycles in any agent-driven workflow.</p>
      <p>
          <a href="https://geggleto.substack.com/p/build-your-agenticos-specialist-agents">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Build Your Own AgenticOS, Part 2: Shapers, The Intake Layer]]></title><description><![CDATA[Intake quality is the ceiling on execution quality. Here's how to build a shaper that scopes briefs before any agent touches a file.]]></description><link>https://geggleto.substack.com/p/build-your-own-agenticos-part-2-shapers</link><guid isPermaLink="false">https://geggleto.substack.com/p/build-your-own-agenticos-part-2-shapers</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Mon, 01 Jun 2026 15:49:34 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/db2640ec-e384-470f-988d-265be7c127b9_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every vague request produces a vague output. The agent doesn't know who the reader is, what "done" looks like, or which of the five interpretations of your request you actually meant. It makes a reasonable guess and starts. Ten minutes later you have something that's technically responsive and completely wrong, and now you're editing instead of approving.</p><p>The shaper is the fix. A shaper is an intake agent whose only job is to turn a half-formed idea into a scoped brief before any execution agent touches a file. Not "clarify the request." Produce a structured document that removes all ambiguity for everything downstream. Intake quality is the ceiling on execution quality. A shaper enforces that ceiling before execution starts. Without one, every agent downstream simply inherits your ambiguity and confidently acts on it.</p><p>This is Part 2 of the Build Your Own AgenticOS series. If Part 1 covered the atomic unit (the skill file), Part 2 is about the layer that protects every skill from getting fired at the wrong target.</p><p>Here's what this post delivers:</p><ul><li><p>What a shaper does, and why it's distinct from the skill it feeds</p></li><li><p>Why shapers come first in every request lifecycle</p></li><li><p>The real routing logic from <code>CLAUDE.md</code> in this repo (which requests route to which shaper and how)</p></li><li><p>How to write a shaper using the <code>AskUserQuestion</code> pattern</p></li><li><p>When to skip the shaper entirely (the answer is narrower than you think)</p></li><li><p>A starter shaper template you can copy today</p></li></ul>
      <p>
          <a href="https://geggleto.substack.com/p/build-your-own-agenticos-part-2-shapers">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Build Your AgenticOS: Start with Skill Files]]></title><description><![CDATA[Skill files are the atomic unit of a Claude Code AgenticOS. Get this layer right and every agent you build downstream becomes composable.]]></description><link>https://geggleto.substack.com/p/build-your-agenticos-start-with-skill</link><guid isPermaLink="false">https://geggleto.substack.com/p/build-your-agenticos-start-with-skill</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Mon, 01 Jun 2026 15:48:57 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/16440994-ca17-4c81-a450-da98a4a23e37_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Without skills, every agent response is a coin flip. The agent knows your stack, knows your tests pass, knows you prefer short methods. Then it improvises the rest. Ask it to write a blog post and it writes a blog post. Ask it again next week and you get a different blog post, structured differently, hitting different beats, because you never told it what a blog post is in this shop. The work looks fine. It just doesn't look the same twice.</p><p>Skills are the fix. A skill file is a markdown instruction set that any agent can load on demand. It encodes how your team does one class of work: the steps in order, the references to consult, the done criteria to verify before declaring complete. One file, committed to the repo, loaded whenever the work matches.</p><p>A skill file is the atomic unit of your AgenticOS. Get this layer right and every agent you build downstream becomes composable. Skip it and you are back to prompting from memory, which is just a slower version of improvising.</p><p>This is Part 1 of Build Your Own AgenticOS. You will learn what a skill file is, how to structure one, when to reach for a template instead of a skill, and how to tell whether a skill is load-bearing before you ship it.</p>
      <p>
          <a href="https://geggleto.substack.com/p/build-your-agenticos-start-with-skill">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Build Your Own AgenticOS: The Complete Map]]></title><description><![CDATA[An AgenticOS is a composable, version-controlled layer that makes AI behaviour consistent, reviewable, and delegatable &#8212; and you can build it in a&#8230;]]></description><link>https://geggleto.substack.com/p/build-your-own-agenticos-the-complete</link><guid isPermaLink="false">https://geggleto.substack.com/p/build-your-own-agenticos-the-complete</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Mon, 01 Jun 2026 15:20:15 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/58f3fb16-9dc0-4fe5-8026-f01723aee0e6_1600x836.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>The system layer that makes AI behaviour consistent, reviewable, and delegatable.</em></p><div><hr></div><p>You're not using AI wrong. You're using it without a system. Every engineer on your team has their own approach: different prompts, different habits, different mental models of what agents can and can't do. The output is inconsistent. The knowledge is non-transferable. When the person who "gets AI" goes on leave, the AI capability walks out the door with them.</p><p>An AgenticOS is the system layer that fixes this. It is a composable, version-controlled set of files that sit inside your repo and tell agents how to behave, what to do, and what the rules are. You can build it in a day. It survives session boundaries, git clones, and team rotations.</p><p>Here is the complete map.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Z6R4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a7e2575-b98a-47c3-898d-b833256e50d2_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z6R4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a7e2575-b98a-47c3-898d-b833256e50d2_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!Z6R4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a7e2575-b98a-47c3-898d-b833256e50d2_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!Z6R4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a7e2575-b98a-47c3-898d-b833256e50d2_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!Z6R4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a7e2575-b98a-47c3-898d-b833256e50d2_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z6R4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a7e2575-b98a-47c3-898d-b833256e50d2_1600x900.png" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a7e2575-b98a-47c3-898d-b833256e50d2_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Z6R4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a7e2575-b98a-47c3-898d-b833256e50d2_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!Z6R4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a7e2575-b98a-47c3-898d-b833256e50d2_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!Z6R4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a7e2575-b98a-47c3-898d-b833256e50d2_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!Z6R4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a7e2575-b98a-47c3-898d-b833256e50d2_1600x900.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>What an AgenticOS Actually Is</h2><p>Before the components, the model: an AgenticOS is not a product, a vendor, or a framework you install. It is a directory structure you commit. The files are plain text. They define behaviour the same way a well-written README defines conventions. The difference is that agents can read them at runtime, and Claude Code (and similar tools) have predictable rules for which files get loaded, in what order, at what priority.</p><p>The full system has seven components. Each one solves a distinct problem. You can adopt them incrementally, starting with the highest-leverage layer and building down. The components are not interchangeable: Skills are the atomic unit. Everything else either produces Skills, consumes them, or governs how they're invoked.</p><div><hr></div><h2>The Seven Components</h2><h3>Skills</h3><p>A skill is a markdown file that tells an agent how to perform a class of work. Not a specific task. A class. <code>blog-post-author</code> handles every blog post. <code>code-reviewer</code> handles every review. <code>prompt-shaper</code> handles every time someone says "I have a vague idea and need it scoped."</p><p>Skills are stored under <code>.claude/skills/&lt;skill-name&gt;/SKILL.md</code>. They can include references, examples, and sub-files. The agent loads the skill when it's invoked and treats it as a first-class instruction set. A skill is not a system prompt and not a mega-prompt. It is a narrow, composable instruction set for one class of work. It can reference other files in its own directory: a <code>references/</code> folder for structural guides, an <code>assets/</code> folder for templates and examples.</p><p>The key property: skills are reusable. Write one once; every agent invocation that hits that skill class gets the same behavior. That is the beginning of consistency. It is also the beginning of reviewability: when the output is wrong, you fix the skill file and every future invocation benefits. You are not fixing a conversation. You are fixing a system.</p><h3>Templates</h3><p>Templates are structured starting points for recurring patterns. Where skills define behavior, templates define structure. A PR description template defines the sections a PR description always has. A post brief template defines the sections a post brief always has. A meeting-notes template defines the sections a meeting note always has.</p><p>Templates live alongside skills or in their own directory. They are most powerful when wired to a shaper (see below) that fills them in based on intake. A blank template is just a document. A filled template produced by an agent is a repeatable output.</p><p>The value of templates compounds. The first time you use a template, you save five minutes. By the tenth time, you have a body of consistently structured outputs that agents can read, compare, and build on top of. The inconsistency that comes from free-form generation accumulates debt. Templates prevent that debt from accruing.</p><h3>Shapers</h3><p>A shaper is an intake agent. Its job is to take a vague request and return a scoped brief. The scope includes: what the output is, who it's for, what the single takeaway or deliverable is, what assets are needed, and what quality criteria apply.</p><p>The reason shapers exist is that most agent failures start at intake. A vague request produces a vague output. The agent makes assumptions you didn't intend. You spend five minutes correcting a twenty-minute draft. Shapers front-load that conversation into a structured moment so the author (or builder, or engineer) knows exactly what to produce.</p><p>Shapers interact with you. They ask a focused set of questions and stop. The brief they produce is the contract everything downstream runs against.</p><h3>Specialist Agents</h3><p>A specialist agent is a named, purpose-built subagent with a declared scope and declared tools. <code>code-reviewer</code> only reviews code; it doesn't write it. <code>security-reviewer</code> only flags security issues; it doesn't fix them. <code>content-ops</code> runs an expert-panel scoring pass; it doesn't redraft the content.</p><p>The principle is separation of concerns at the agent level. Generalist agents are fine for exploration. Production-grade agent systems use specialists because narrow scope means fewer errors, cleaner output, and reviewable decisions.</p><p>Specialist agents live under <code>.claude/agents/</code>. Each one has a system prompt, a tool allowlist, and a declared output contract. You call them from orchestrator agents; they do the work and report back.</p><h3>Memory</h3><p>Memory is persistent context that survives session boundaries. By default, an AI conversation forgets everything when the session ends. Memory is the fix: a directory of short markdown files, each capturing one non-obvious fact that would otherwise have to be rediscovered.</p><p>Memory files live under <code>.claude/memory/</code>. An index file (<code>.claude/memory/MEMORY.md</code>) lists every entry with a one-line hook. At the start of each session, the agent reads the index, scans for relevant entries, and starts with context that would otherwise take fifteen minutes of re-explanation to reconstruct.</p><p>What belongs in memory: decisions, preferences, rules that were learned through correction, in-flight initiatives, people and their roles. What does not belong: things derivable from the repo itself. Memory is for facts the code doesn't contain.</p><p>The discipline is the index. If the index grows past 200 lines, it gets truncated in context and stops being useful. Every new entry should earn its place by answering the question: is this something a future session would otherwise have to painfully relearn? If yes, write it. If the code already shows it, skip it.</p><h3>Hooks</h3><p>Hooks are automated behaviors wired to session events. They live in <code>settings.json</code> under the <code>hooks</code> key. You can fire a hook on <code>PreToolUse</code> (before an agent takes an action), <code>PostToolUse</code> (after), and <code>Stop</code> (when an agent session ends).</p><p>The canonical use case: a <code>PostToolUse</code> hook on <code>Write</code> that auto-runs your linter. A <code>PreToolUse</code> hook on <code>Bash</code> that logs the command for audit. A <code>Stop</code> hook that posts a summary to your team Slack channel.</p><p>Hooks are where the AgenticOS connects to your existing toolchain. They are lightweight event handlers. They do not need to be complex to be valuable. A five-line hook that validates every file write pays for itself the first time it catches a malformed JSON write before it reaches CI.</p><h3>CLAUDE.md</h3><p>CLAUDE.md is the constitution. It is the highest-priority instruction file in the system. Claude Code reads it at session start and treats it as ground truth. Every other instruction source (skill files, agent prompts, in-conversation instructions) operates within the bounds CLAUDE.md sets.</p><p>CLAUDE.md can live at two levels:</p><ul><li><p><code>~/.claude/CLAUDE.md</code>: global rules that apply across every project on the machine</p></li><li><p><code>&lt;repo-root&gt;/CLAUDE.md</code>: project-specific rules that override or extend the global rules</p></li></ul><p>The global file sets universal norms: memory path, subagent dispatch patterns, anti-patterns, communication style. The project file sets repo-specific norms: which shapers apply to which work types, how to run tests, what the branching strategy is, who owns what.</p><p>The common mistake is starting with CLAUDE.md. Don't. Start with skills.</p><div><hr></div><h2>Build Order</h2><p>Build skills first, not CLAUDE.md.</p><p>The reason is that skills are atomic. Each skill file does one thing. It has no dependencies on other skills, on memory, or on hooks. You can write a single skill, invoke it, and immediately see whether it works. Skills give you fast feedback with zero risk of circular dependency.</p><p>CLAUDE.md, by contrast, references everything else. If you write your CLAUDE.md before you have skills, you are writing rules that reference capabilities that don't exist yet. The result is a constitution full of dead letters.</p><p>The practical build order:</p><ul><li><p>Start with the skill for the work type you do most. One file. Invoke it. Fix it.</p></li><li><p>Add a second skill for the next most common work type. Invoke it.</p></li><li><p>Write a shaper for each skill that needs structured intake.</p></li><li><p>Add memory once you have enough repeated sessions to know what you keep re-explaining.</p></li><li><p>Wire hooks once you have enough agent sessions to know which side effects you want to automate.</p></li><li><p>Write CLAUDE.md once the rest of the system exists and you know what rules actually govern it.</p></li></ul><p>The other build-order mistake is building everything in isolation and then wiring it together. Skills that have never been invoked inside a real session are theory. Invoke early and often. The feedback loop between "I wrote a skill file" and "this skill file produces the output I actually want" is where the real design work happens. You will rewrite your first skill file at least twice. That is not failure. That is calibration.</p><div><hr></div><h2>Value and Effort by Layer</h2><p>Each layer is listed in build order. Effort is relative to your first skill taking roughly two hours.</p><ul><li><p>Skills: highest ongoing leverage. Effort: two to four hours per skill. Payback: immediate, every invocation.</p></li><li><p>Templates: medium leverage. Effort: thirty to sixty minutes per template. Payback: fast if the work type recurs daily.</p></li><li><p>Shapers: high leverage for team use. Effort: two to three hours per shaper. Payback: strongest when multiple people are using the same skill.</p></li><li><p>Specialist agents: high leverage for fan-out workloads. Effort: one to two hours per agent. Payback: strongest when you are parallelizing review or generation across multiple subagents.</p></li><li><p>Memory: medium leverage, compounds over time. Effort: ten minutes per entry, ongoing. Payback: slow start, then becomes the most time-saving layer as the knowledge base grows.</p></li><li><p>Hooks: low effort for high reliability. Effort: thirty minutes per hook. Payback: immediate if you have an existing CI/lint step to connect.</p></li><li><p>CLAUDE.md: one-time setup, high trust anchor. Effort: two to four hours for the first version. Payback: removes a category of repeated re-explanation permanently.</p></li></ul><div><hr></div><h2>The Series: Where Each Layer Goes Deeper (coming soon)</h2><p>This post is the map. The series builds each layer out in full.</p><ul><li><p><a href="#p1">Skills and Templates: The Atomic Unit of Agent Behavior &#8594;</a></p></li><li><p><a href="#p2">Shapers: How to Structure Intake So Agents Don't Assume &#8594;</a></p></li><li><p><a href="#p3">Specialist Agents: Separation of Concerns at the Agent Level &#8594;</a></p></li><li><p><a href="#p4">Memory: The Persistent Context Layer &#8594;</a></p></li><li><p><a href="#p5">Hooks: Wiring Agents to Your Toolchain &#8594;</a></p></li><li><p><a href="#p6">CLAUDE.md: Writing the Constitution Last &#8594;</a></p></li></ul><p>Each post is paid. The map is free because the map is useless without the build playbook, and the build playbook is what you subscribe for.</p><div><hr></div><p>That's the map. The build playbook starts with skills. Subscribe to get each layer as I write it.</p><p><a href="https://geggleto.substack.com">Subscribe to AgenticOS on Substack</a></p><div><hr></div>]]></content:encoded></item><item><title><![CDATA[You Don't Get to Pick CA]]></title><description><![CDATA[CAP says pick two, but partitions aren't optional. So you're really choosing Consistency or Availability *during* a partition. And PACELC is the&#8230;]]></description><link>https://geggleto.substack.com/p/you-dont-get-to-pick-ca</link><guid isPermaLink="false">https://geggleto.substack.com/p/you-dont-get-to-pick-ca</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Fri, 29 May 2026 23:59:39 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e7ee0938-e498-432e-9579-855440f79832_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A team I worked with had a slow dashboard. The account balance endpoint hit Postgres on every load, the page felt sluggish under traffic, and the fix was obvious to everyone: add a read replica. Point the dashboard reads at the replica, keep writes on the primary, ship it. The graph got faster. Nobody filed it as a correctness change because it wasn't one. It was a performance change.</p><p>Three weeks later, support got a ticket. A user had spent their balance, the deduction committed on the primary, and for a second and a half the dashboard (reading off the replica, which hadn't caught up) still showed the old, higher number. The user saw money they no longer had. During that window someone could have made a second decision against a balance that was already gone.</p><p>Nobody decided to weaken consistency. There was no design doc that said "we accept showing users stale balances." The team added a replica to make a page faster, and in doing so they quietly traded away a guarantee they didn't know they were holding. The trade was real. The decision was never made.</p><p>Here's the position I want to argue: you do not get to pick "CA." The C-versus-A choice everyone quotes from CAP isn't a menu you order from once at architecture time. It's a behavior your system exhibits during a partition, and a <em>different</em> tradeoff, latency versus consistency, that your system makes on every single read when there's no partition at all. You are choosing both of these constantly. The only question is whether you're choosing them on purpose.</p><h2>The "P" was never optional</h2><p>CAP gets quoted as "pick two of three: Consistency, Availability, Partition tolerance." That framing is where the damage starts, because it puts P on the same shelf as the other two, as if it were a property you might decline.</p><p>You can't decline it. P is partition tolerance: the system continuing to function when the network between nodes drops, delays, or reorders messages. And the network <em>will</em> do that. A switch reboots, a NIC flaps, an availability zone goes dark, a Kubernetes node gets cordoned mid-deploy, a GC pause makes a node look dead for 800ms. The moment your data lives on more than one machine (a replica, a second region, a cache on a different box) partitions are a thing that happens <em>to</em> you. Refusing to tolerate them doesn't mean they stop. It means your system corrupts or hangs when one occurs.</p><p>So the honest reading of CAP is not "pick two." It's: <strong>partitions happen, so when one does, you get to pick exactly one of C or A, and you've already picked, whether you know it or not.</strong></p><ul><li><p>Choose <strong>C</strong> (a CP system): during a partition, refuse to serve requests you can't serve correctly. The node that can't confirm it has the latest data returns an error or blocks rather than hand back a possibly-stale answer. You stay correct; you give up availability for the duration.</p></li><li><p>Choose <strong>A</strong> (an AP system): during a partition, keep answering with whatever data you have locally, even if it might be stale or might later conflict. You stay up; you give up consistency for the duration.</p></li></ul><p>There is no third door where the partition politely waits for you. The replica setup above is an AP choice that nobody recognized as a choice. When the replica lagged (a tiny partition in time, if not in topology), the system happily served the stale balance instead of refusing. That was availability winning over consistency. It just won by default, in a config change labeled "performance."</p><h2>PACELC: the half you decide every day</h2><p>Here's the part CAP leaves out, and it's the part that actually runs your life. CAP only describes what happens <em>during</em> a partition. Partitions are rare. What about the other 99.9% of the time, when the network is fine?</p><p>That's PACELC. Read it as: <em>if Partition, then Availability-or-Consistency; Else, Latency-or-Consistency.</em> The first half is just CAP. The second half, the "E," for <em>else</em>, is the one you decide on every read, every day, usually without noticing.</p><p>When there's no partition and everything is healthy, you are <em>still</em> trading consistency against latency. Every time you add a cache, you've decided that serving a possibly-stale value fast is better than fetching the authoritative value slow. Every time you read from a replica, you've decided that the replica's slightly-behind view is acceptable in exchange for taking load off the primary. Those aren't partition-time decisions. They're the normal, healthy-network state of your system.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YqYw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a9f9b2-c56b-4fcc-bdfd-038c21992e8b_1600x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YqYw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a9f9b2-c56b-4fcc-bdfd-038c21992e8b_1600x1080.png 424w, https://substackcdn.com/image/fetch/$s_!YqYw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a9f9b2-c56b-4fcc-bdfd-038c21992e8b_1600x1080.png 848w, https://substackcdn.com/image/fetch/$s_!YqYw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a9f9b2-c56b-4fcc-bdfd-038c21992e8b_1600x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!YqYw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a9f9b2-c56b-4fcc-bdfd-038c21992e8b_1600x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YqYw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a9f9b2-c56b-4fcc-bdfd-038c21992e8b_1600x1080.png" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8a9f9b2-c56b-4fcc-bdfd-038c21992e8b_1600x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Decision tree: ask if a partition is happening; if yes (CAP) you keep Consistency or Availability, if no (PACELC) you keep Latency or Consistency, with example systems on each leaf.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Decision tree: ask if a partition is happening; if yes (CAP) you keep Consistency or Availability, if no (PACELC) you keep Latency or Consistency, with example systems on each leaf." title="Decision tree: ask if a partition is happening; if yes (CAP) you keep Consistency or Availability, if no (PACELC) you keep Latency or Consistency, with example systems on each leaf." srcset="https://substackcdn.com/image/fetch/$s_!YqYw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a9f9b2-c56b-4fcc-bdfd-038c21992e8b_1600x1080.png 424w, https://substackcdn.com/image/fetch/$s_!YqYw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a9f9b2-c56b-4fcc-bdfd-038c21992e8b_1600x1080.png 848w, https://substackcdn.com/image/fetch/$s_!YqYw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a9f9b2-c56b-4fcc-bdfd-038c21992e8b_1600x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!YqYw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a9f9b2-c56b-4fcc-bdfd-038c21992e8b_1600x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The dashboard team thought they were operating in the "E" branch: no partition, just trading a little latency for a faster page. And most of the time they were, and it was fine. But they never specified the "P" branch. They never said what the system should do when the replica fell behind. So the system did the default AP thing, serving stale, at exactly the moment correctness mattered. The everyday latency tradeoff and the rare partition tradeoff are <em>the same architectural seam</em>, and they'd only reasoned about one side of it.</p><blockquote><p>A cache and a read replica are not performance features. They are consistency tradeoffs that happen to make things faster. The speed is the part you notice. The traded-away guarantee is the part that pages you at 2am.</p></blockquote><h2>The agent will make this trade for you, silently</h2><p>Now drop an AI agent into this. Ask it to "make this endpoint scale" or "this query is slow, speed it up," and watch what it does.</p><p>It will, cheerfully and competently, add a cache. Or suggest reading from a replica. Or memoize the result. The code will be clean. It'll wire up Redis with a sensible TTL, or flip the read to a replica connection, and it'll compile and pass your tests, because your tests assert on a single-node happy path where the cache and the primary always agree.</p><p>What it will <em>not</em> do is tell you it just changed your correctness model. There's no line in the diff that says "heads up: this endpoint is now AP. Under replica lag or a cache that's mid-invalidation, it will serve stale data, and for a balance check that's a correctness bug." The agent optimized the metric you named (latency) and silently spent a budget you didn't name (consistency). This is exactly the <em>silent logic drift</em> failure mode: it did the task you asked and quietly changed a guarantee you didn't, in a diff that reads like a reasonable optimization. The agent isn't wrong that a cache makes it faster. It's wrong by omission about what that costs.</p><h2>"But strong consistency everywhere is the safe default, right?"</h2><p>This is the natural objection, and it deserves a real answer: <em>if AP is so dangerous, just be CP everywhere. Strong consistency, no caches, no replicas for reads, always correct.</em> Safe, right?</p><p>No, and not because it's slow, though it is. It's because "CP everywhere" is also a choice with a cost you have to actually want. A strictly consistent system gives up availability <em>during partitions by design</em>: when the network splits, the minority side stops serving rather than risk divergence. If you genuinely make everything CP, you're signing up for the endpoint to return errors during every network blip, every failover, every node that's briefly unreachable. For a payment authorization, that's correct and worth it. You'd rather decline than double-spend. For a "number of likes" counter or a recommendations widget, refusing to serve during a blip is absurd; stale-but-up is obviously right.</p><p>The point isn't that CP beats AP or the reverse. It's that <em>the right answer is per-feature, and it has to be chosen</em>. A balance check wants CP. A view counter wants AP. The dashboard's bug wasn't that they picked AP. It's that a balance, which wanted CP, got AP by accident because nobody made the call. "Strong everywhere" isn't safety. It's a different unmade decision with a different bill.</p><h2>Make the choice before the agent makes it for you</h2><p>So the inversion is this: every read path in your system has already chosen C-or-A for partitions and L-or-C for the healthy case. The cache you added, the replica you read from, the strict primary you kept &#8212; each is a stance. You're not deciding <em>whether</em> to make these tradeoffs. You're only deciding whether they're written down or discovered in an incident channel.</p><p>Before you let an agent "make it scale," make the call yourself. Here's a starter prompt that forces the agent to surface the tradeoff instead of burying it in a TTL:</p><pre><code>For the feature below, classify it as CP or AP and justify the choice in one sentence.

Then answer, concretely:
1. PARTITION BEHAVIOR: When the network partitions (e.g. a read
   replica lags, the cache and source disagree, or a node is
   unreachable), exactly what should this feature do &#8212; serve
   possibly-stale data, return an error, block, or something else?
   State the actual behavior, not the principle.
2. PACELC / ELSE BEHAVIOR: When there is NO partition and everything
   is healthy, what latency-vs-consistency tradeoff does this feature
   make on a normal read (e.g. read from a cache/replica for speed, or
   always hit the authoritative source)? Name it and justify it.

Do not propose an implementation yet. First commit to the behavior.

Feature: &lt;paste your endpoint / feature description here&gt;</code></pre><p><strong>What to verify:</strong> check that it actually <em>committed to a behavior during a partition</em>, a concrete "serve stale" or "return 503" or "block until caught up," not just a tidy definition of CP and AP. If it defined the terms and dodged the behavior, it dodged the only part that matters. Make it answer "what does this <em>do</em> when the replica lags," in those words.</p><p>That's the theory and a way to force the decision into the open. The production patterns that actually <em>implement</em> each side of these choices (idempotency keys so a retried write is safe, retries with backoff and jitter, sagas for the transactions that cross a service boundary, distributed locks with fencing tokens, caching with stampede protection) are the paid series, Tuesdays and Thursdays. Each one comes with the prompt to generate it and the checklist to verify what the agent hands you against the failure modes that pattern hides. This post is F1: it's the lens the whole series looks through, because every one of those patterns is a deliberate answer to a C-or-A, L-or-C question you'd otherwise answer by accident.</p><p>Subscribe free for the Friday theory. Upgrade when you want the implementations and the verification checklists.</p><p>And tell me in the comments: what's a "performance optimization" in your system that quietly changed what your users are allowed to see &#8212; and who found out first, you or them?</p>]]></content:encoded></item><item><title><![CDATA[Your Handler Will Run Twice]]></title><description><![CDATA[At-least-once delivery isn't a maybe &#8212; the network *will* deliver that message again. If re-running your handler isn't safe, you don't have a bug&#8230;]]></description><link>https://geggleto.substack.com/p/your-handler-will-run-twice</link><guid isPermaLink="false">https://geggleto.substack.com/p/your-handler-will-run-twice</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Fri, 29 May 2026 23:59:16 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ca42a6ea-8a11-4876-a5f0-60c92ac0c049_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>3:10am. PagerDuty. A customer is charged $49 twice for one subscription, eleven seconds apart. You pull the logs and find two near-identical webhook deliveries from the payment processor for the same charge. Two <code>200</code>s went back. Two rows in the ledger.</p><p>Here's the part that ruins your morning: nothing was broken. The first call to your handler <em>worked</em>. The card was charged, the row was written, and then the <code>200</code> you sent back got lost on the way home. The network ate the ack. From the processor's side, your endpoint never responded, so it did exactly what a correct system does: it retried. The second delivery hit a handler that had no idea the first one had ever happened, and charged the card again.</p><p>The retry wasn't the bug. The retry was <em>correct</em>. Your handler just wasn't safe to run twice, and on a long enough timeline, every handler runs twice.</p><p>Here's the position I want to argue: at-least-once delivery is not an edge case you can defer. It's the contract every queue, webhook, and retrying client already operates under, whether you opted in or not. Duplicates aren't an exception your system might encounter. They're a guarantee it ships with. Idempotency is the only thing that makes "ran twice" equal "ran once." If you haven't built it, you don't have a hypothetical risk. You have a double-charge with a date on it.</p><h2>"Exactly-once" is a sentence you can say, not a thing you can build</h2><p>Everyone <em>wants</em> exactly-once delivery: the message arrives, your handler runs, once, done. It's a lovely idea and it does not survive contact with a network.</p><p>Walk the failure. A sender delivers a message and waits for an ack. The ack can get lost. Now the sender is stuck with a question it physically cannot answer: did the receiver process the message and the ack vanished, or did the message itself never arrive? From where the sender sits, those two worlds look identical. It has exactly two moves. Give up, and risk dropping a message that actually succeeded (that's <em>at-most-once</em>, and it loses data). Or retry, and risk running a message that already ran (that's <em>at-least-once</em>, and it makes duplicates).</p><p>There is no third door. You cannot build exactly-once on top of an unreliable network, because the sender can never tell "it worked" apart from "the receipt got lost." Every serious system picks at-least-once, because losing a payment is worse than processing one twice, <em>if the receiver is built to absorb the duplicate.</em> That "if" is the whole job. The system delivers at-least-once and hands you the duplicate. What you do with it is idempotency.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RXVj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4355cb2-42f2-4508-846c-45c1958b8cbf_1600x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RXVj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4355cb2-42f2-4508-846c-45c1958b8cbf_1600x1080.png 424w, https://substackcdn.com/image/fetch/$s_!RXVj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4355cb2-42f2-4508-846c-45c1958b8cbf_1600x1080.png 848w, https://substackcdn.com/image/fetch/$s_!RXVj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4355cb2-42f2-4508-846c-45c1958b8cbf_1600x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!RXVj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4355cb2-42f2-4508-846c-45c1958b8cbf_1600x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RXVj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4355cb2-42f2-4508-846c-45c1958b8cbf_1600x1080.png" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e4355cb2-42f2-4508-846c-45c1958b8cbf_1600x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Two timelines deliver the same message twice: the non-idempotent handler charges $50 twice for $100, while the idempotent handler dedups the duplicate on its key and charges $50 exactly once.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Two timelines deliver the same message twice: the non-idempotent handler charges $50 twice for $100, while the idempotent handler dedups the duplicate on its key and charges $50 exactly once." title="Two timelines deliver the same message twice: the non-idempotent handler charges $50 twice for $100, while the idempotent handler dedups the duplicate on its key and charges $50 exactly once." srcset="https://substackcdn.com/image/fetch/$s_!RXVj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4355cb2-42f2-4508-846c-45c1958b8cbf_1600x1080.png 424w, https://substackcdn.com/image/fetch/$s_!RXVj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4355cb2-42f2-4508-846c-45c1958b8cbf_1600x1080.png 848w, https://substackcdn.com/image/fetch/$s_!RXVj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4355cb2-42f2-4508-846c-45c1958b8cbf_1600x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!RXVj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4355cb2-42f2-4508-846c-45c1958b8cbf_1600x1080.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The agent writes the version that double-charges</h2><p>Ask an agent for a "create payment" or "process order" handler and you will get clean, idiomatic, straight-line TypeScript. Validate the body. Charge the card. Write the row. Return <code>201</code>. It reads beautifully. It compiles. The happy-path test, one request in, one charge out, goes green.</p><p>And it double-charges in production, because nothing in that handler asks the only question that matters: <em>have I seen this request before?</em> The agent wrote a handler that's correct for exactly-once delivery, in a world that only offers at-least-once. The gap never shows up in review, because the diff is genuinely good code. It shows up at 3am, because no test in the suite sends the same request twice. The duplicate is the case nobody wrote, so it's the case nobody caught.</p><p>This is the AI-native trap in miniature: the model optimizes for the request you typed, and you typed "create payment," not "create payment safely under retry." The retry is invisible in the prompt, so it's invisible in the output.</p><h2>Idempotent is not the same as "retried"</h2><p>This is the distinction that trips people, so let me draw the line sharply.</p><ul><li><p>A <strong>retried</strong> handler is one a client calls again after a failure. That's a property of the <em>caller</em>. Anything can be retried. Retrying a non-idempotent handler is exactly how you double-charge.</p></li><li><p>An <strong>idempotent</strong> handler is one where running it N times lands the system in the same state as running it once. That's a property of the <em>handler</em>. Idempotency is what makes a retry safe instead of dangerous.</p></li></ul><p>Retries are inevitable. Idempotency is the thing you build so the inevitable retry doesn't cost you money.</p><p>Two ways to get there. The cheap one is <strong>natural idempotency</strong>: design the operation so re-running it is a no-op by construction. This is the real reason HTTP draws a line between <code>PUT</code> and <code>POST</code>. <code>PUT /orders/123/status = shipped</code> is naturally idempotent: send it five times, the status is <code>shipped</code>, full stop. The fifth call overwrites the same field with the same value and nothing new happens. <code>POST /charges</code> is naturally <em>not</em> idempotent: each call means "make a new charge," so five calls mean five charges. When you can model an operation as "set this resource to this state" instead of "perform this action," you get idempotency for free and you should take it.</p><p>But you can't always restructure the operation. Charging a card is inherently a <code>POST</code>-shaped "do a thing" action. For those, you need the second tool.</p><h2>Idempotency keys, and the window you're choosing whether you admit it or not</h2><p>An <strong>idempotency key</strong> is the caller's claim that "this request and that earlier request are the same logical operation &#8212; don't do it twice." The processor sends a stable key with each delivery (and its retries reuse the same key). Your handler's job: the first time it sees a key, do the work and remember the outcome; every later time it sees that key, skip the work and replay the remembered outcome.</p><p>So the duplicate webhook from the cold open arrives carrying the same key as the original. Your handler looks it up, sees "already charged, here's the <code>201</code> I sent last time," and replays that exact response. The processor gets its <code>200</code>. The card is charged once. The retry was still correct. Your handler just made it harmless.</p><p>Which forces a question most handlers never answer out loud: how <em>long</em> do you remember a key? That's your <strong>dedup window</strong>, and you are choosing one whether you decide it deliberately or not. Remember keys for an hour and a retry that lands 90 minutes later (entirely possible when a processor backs off through a long outage) sails past your memory and charges again. Remember them forever and your dedup store grows without bound. The window is a real engineering decision with a real failure mode on each side, and "I didn't think about it" defaults you to whatever your store's eviction policy happens to be.</p><h2>The objection: "the processor already handles this, why is it my problem?"</h2><p>Fair. Stripe and the rest <em>do</em> offer idempotency keys on their side. So why duplicate the machinery in your handler?</p><p>Because their key protects <em>their</em> operation, not yours. When a webhook fires, <em>you</em> are the receiver, and the at-least-once contract runs to your door. Your handler is the thing that writes to <em>your</em> ledger, enqueues <em>your</em> fulfillment, sends <em>your</em> confirmation email. The upstream key stops them from creating two charges. It does nothing to stop your handler, invoked twice by two webhook deliveries, from writing two ledger rows and sending two emails. The boundary you have to make safe is the one you own. Every hop in the system is its own at-least-once boundary, and each one needs its own answer.</p><h2>A starter prompt &#8212; and the one thing the agent will get wrong</h2><p>Here's a prompt that gets you a real first draft instead of the straight-line double-charger:</p><pre><code>I have a POST handler in TypeScript/Node (Express) that charges a card and
writes a ledger row. Make it idempotent using an idempotency key sent in the
`Idempotency-Key` header.

Requirements:
- On the FIRST request for a key: persist the key in an "in-progress" state
  BEFORE doing the work, do the charge + ledger write, then transition the
  key to "completed" and store the response body + status code.
- On a DUPLICATE request for a completed key: do NOT redo the work &#8212; replay
  the stored response body and status code.
- State the dedup window you chose (how long keys are retained) and why.
- Then explain what happens when TWO requests with the SAME key arrive
  CONCURRENTLY (not sequentially), and make that case safe.</code></pre><p><strong>What to verify:</strong> the concurrent-duplicate race, not just the sequential retry. Two deliveries with the same key arriving <em>at the same time</em> will both check the store, both see "no key yet," and both proceed to charge, unless the first write of the key is an atomic claim (a unique constraint or an atomic <code>SET NX</code>) that one request wins and the other is forced to wait for or replay. Most agent first drafts handle the retry that arrives a second later and quietly assume requests are sequential. The duplicate that arrives in the same millisecond is the one that double-charges through your shiny new idempotency layer. If the answer doesn't name that race, it isn't done.</p><div><hr></div><p>That's the theory and a starter prompt: enough to know what <em>right</em> looks like and to get a first draft that won't double-charge on a sequential retry. The production version is the next thing I'm publishing in the paid series (Tuesdays and Thursdays): <strong>"Idempotency keys in production."</strong> It's the full middleware with request fingerprinting, the in-progress vs. completed state machine, response replay, the TTL that sets your dedup window, both a Postgres and a Redis variant, and the concurrent-duplicate tests that prove the race is actually closed. Plus the prompt to generate it and the checklist to verify what the agent hands you against every failure mode this post named.</p><p>Subscribe free for the Friday theory. Upgrade when you want the implementation that survives the retry.</p><p>And tell me in the comments: what's the most expensive duplicate your system ever processed, and was it the retry's fault, or the handler's?</p>]]></content:encoded></item><item><title><![CDATA[Retries Don't Make You Fault-Tolerant]]></title><description><![CDATA[Done wrong, retries are how one slow dependency takes down everything. The real first move isn't retries &#8212; it's timeouts.]]></description><link>https://geggleto.substack.com/p/retries-dont-make-you-fault-tolerant</link><guid isPermaLink="false">https://geggleto.substack.com/p/retries-dont-make-you-fault-tolerant</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Fri, 29 May 2026 23:59:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/842006de-9101-48a4-8e6b-416aa27b5f93_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>3:10pm. The payments service starts answering slowly. Not failing, just slow. p99 drifts from 120ms to 1.8s. Nothing pages, because nothing is down. A garbage-collection pause on their side, maybe a noisy neighbor. The kind of blip that should heal itself in ninety seconds.</p><p>It doesn't heal. By 3:14pm the checkout service is throwing 503s, and so is the cart service behind it, and the storefront behind that. A GC pause two hops away has become a customer-facing outage. The incident channel fills with people asking why a payments slowdown took down the homepage.</p><p>Here's why. Every caller of payments was wrapped in the same reasonable-looking retry: try the call, if it fails wait 200ms, try again, up to three times. When payments got slow, every in-flight request waited, eventually failed, and <em>all retried at once</em>: three times each, on a fixed 200ms beat, perfectly synchronized. The wounded service, already struggling, got hit with triple its normal traffic in tight rhythmic waves. That finished it off. Then the callers' connection pools filled with requests stuck retrying, so the callers stopped answering <em>their</em> callers, and the failure walked outward one hop at a time.</p><p>The retries didn't make the system fault-tolerant. The retries <em>were</em> the fault.</p><h2>The position: retries are an amplifier, and timeouts are the first-class concern</h2><p>Here's what I want to argue. A retry is not a safety mechanism. A retry is a load multiplier with a delay built in. Pointed at a healthy dependency that had a one-off transient failure, it's exactly what you want. Pointed at a dependency that is slow or struggling (which is <em>precisely when retries fire most</em>), it multiplies load onto the system least able to absorb it. Retries are a positive feedback loop, and positive feedback loops are how small perturbations become outages.</p><p>So the first move when you're hardening a call across the network is not "add retries." It's "bound the wait." A timeout is the only thing in your toolkit that strictly <em>reduces</em> load under stress instead of adding to it. Everything else (backoff, breakers, bulkheads) exists to make retries safe enough to be worth having. The timeout is the floor you build on.</p><p>Most of the time, when someone asks an agent to "make this call resilient," they get the amplifier and skip the floor.</p><h2>The toolkit, in order</h2><p>Fault tolerance isn't one trick. It's a layered discipline, and the order matters because each layer stops a failure the layer below can't:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PvVF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49b2f5b0-ddf8-4843-80f2-471955fc4f09_1600x1170.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PvVF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49b2f5b0-ddf8-4843-80f2-471955fc4f09_1600x1170.png 424w, https://substackcdn.com/image/fetch/$s_!PvVF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49b2f5b0-ddf8-4843-80f2-471955fc4f09_1600x1170.png 848w, https://substackcdn.com/image/fetch/$s_!PvVF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49b2f5b0-ddf8-4843-80f2-471955fc4f09_1600x1170.png 1272w, https://substackcdn.com/image/fetch/$s_!PvVF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49b2f5b0-ddf8-4843-80f2-471955fc4f09_1600x1170.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PvVF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49b2f5b0-ddf8-4843-80f2-471955fc4f09_1600x1170.png" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/49b2f5b0-ddf8-4843-80f2-471955fc4f09_1600x1170.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;A five-layer fault-tolerance stack from request edge to core &#8212; timeout, backoff with jitter, circuit breaker, bulkhead, backpressure &#8212; each labeled with the specific failure it stops.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A five-layer fault-tolerance stack from request edge to core &#8212; timeout, backoff with jitter, circuit breaker, bulkhead, backpressure &#8212; each labeled with the specific failure it stops." title="A five-layer fault-tolerance stack from request edge to core &#8212; timeout, backoff with jitter, circuit breaker, bulkhead, backpressure &#8212; each labeled with the specific failure it stops." srcset="https://substackcdn.com/image/fetch/$s_!PvVF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49b2f5b0-ddf8-4843-80f2-471955fc4f09_1600x1170.png 424w, https://substackcdn.com/image/fetch/$s_!PvVF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49b2f5b0-ddf8-4843-80f2-471955fc4f09_1600x1170.png 848w, https://substackcdn.com/image/fetch/$s_!PvVF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49b2f5b0-ddf8-4843-80f2-471955fc4f09_1600x1170.png 1272w, https://substackcdn.com/image/fetch/$s_!PvVF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49b2f5b0-ddf8-4843-80f2-471955fc4f09_1600x1170.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Timeout</strong>: bound every wait. The failure it stops: an unbounded wait. Without a timeout, one slow dependency holds your request open forever, and held-open requests are how connection pools and the event loop's pending work pile up until you fall over. No timeout, no fault tolerance. Full stop.</p></li><li><p><strong>Exponential backoff with jitter</strong>: space the retries out, and <em>desynchronize</em> them. The failure it stops: the retry storm from the cold open. Exponential backoff (200ms, 400ms, 800ms) keeps you from hammering. Jitter, randomizing each delay instead of everyone waiting exactly 200ms, is the part that breaks the synchronized thundering herd. Backoff without jitter still produces tidy, lethal waves.</p></li><li><p><strong>Circuit breaker</strong>: stop calling a dependency that's clearly down. The failure it stops: hammering a corpse. After enough failures, the breaker <em>opens</em> and fails fast for a cooldown, then <em>half-opens</em> to test the water with a trickle, then <em>closes</em> when health returns. This is what gives the wounded service room to recover instead of being kept under by your retries.</p></li><li><p><strong>Bulkhead</strong>: isolate each dependency's resources. The failure it stops: one slow dependency starving every other call. If all your outbound calls share one connection pool or one concurrency budget, a single slow dependency consumes it and <em>unrelated</em> calls start failing too. Bulkheads cap how much of your capacity any one dependency can hold, like watertight compartments in a ship's hull.</p></li><li><p><strong>Backpressure / load-shedding</strong>: refuse work you can't do. The failure it stops: overload collapse. When you're past capacity, accepting more requests just makes everything slower for everyone and serves nobody. Shedding load, saying "no" fast with a 429, keeps the requests you <em>do</em> accept healthy.</p></li></ul><p>Each of those is a deep-dive of its own. The point of this post is the shape: timeouts and jitter and error-classification at the bottom, breakers and bulkheads and shedding on top. Skip the bottom and the top can't save you.</p><h2>The one distinction that changes everything: transient vs permanent</h2><p>Underneath all of it sits a decision the naive version always gets wrong: <em>should this error even be retried?</em></p><p>A timeout, a connection reset, a 503, a 429: those are <strong>transient</strong>. The dependency might succeed if you ask again later. Retrying is rational.</p><p>A 400, a 401, a 404, a validation rejection: those are <strong>permanent</strong>. The request is malformed or unauthorized or pointed at something that doesn't exist. Asking again, ever, with the same input, will <em>never</em> succeed. Retrying a 400 three times is just sending a guaranteed-doomed request four times: pure amplification, zero upside. You burned the dependency's capacity to confirm something you already knew on the first try.</p><p>Classify before you retry. Retry the transient, fail fast on the permanent.</p><h2>"AI gives you this for free now" &#8212; no, it gives you the storm</h2><p>Ask an agent to "add retries to this client call" and watch what you get. Something like: a <code>for</code> loop, three attempts, <code>await new Promise(r =&gt; setTimeout(r, 200))</code> between them, wrapped in a <code>try/catch</code> that retries on <em>any</em> thrown error.</p><p>It compiles. It reads cleanly. It passes the happy-path test, because on the happy path the first attempt succeeds and the loop never runs twice. A skim approves it.</p><p>It's the cold open. Fixed delay, no jitter: synchronized waves. No timeout, so a slow dependency hangs every attempt for as long as it likes, and your three "retries" become three unbounded waits stacked end to end. Catch-all error handling, so it cheerfully retries the 400 that will never, ever succeed. The agent built the exact retry storm that takes down the callee, and it built it in code that looks like the responsible thing to do. This is AI's silent-logic-drift failure mode wearing a competence disguise: the dangerous version and the safe version look almost identical, and the difference is everything that's <em>missing</em>.</p><h2>The starter prompt</h2><p>You don't fix this by typing more carefully. You fix it by asking for the right thing and then verifying the parts that don't show up in a skim. Here's a starter prompt that names every layer the naive version drops:</p><pre><code>Wrap the call `await paymentsClient.charge(req)` so it is resilient to a
slow or failing dependency, in TypeScript:

- Enforce a hard per-attempt timeout (the call must not be able to hang).
- Retry only TRANSIENT failures (timeouts, connection resets, 5xx, 429).
  Do NOT retry permanent failures (4xx other than 429 &#8212; 400/401/403/404).
- Use exponential backoff WITH jitter between retries (not a fixed delay).
- Put a circuit breaker in front: after N consecutive failures, open and
  fail fast for a cooldown, then half-open to probe, then close on success.

Then, in plain prose, state: the total retry budget (max attempts and
worst-case total time), and the exact conditions under which the breaker
moves between closed / open / half-open.</code></pre><p><strong>What to verify:</strong> confirm there's a hard per-attempt timeout (not just a total deadline), that the backoff is actually <em>jittered</em> and not a fixed delay, and that 4xx-style permanent errors are explicitly <strong>not</strong> retried. If any one of those is missing, you have the storm again, politely.</p><h2>The inversion</h2><p>The instinct says: a system gets more reliable as you add retries. The reality is the opposite. Each retry you add without a timeout, without jitter, without error-classification, makes the system <em>more</em> fragile under exactly the load it's supposed to survive, because you've added another path that multiplies traffic at the worst possible moment. Reliability didn't come from the retries. It came from the timeouts that bound them, the jitter that desynchronized them, the breaker that stopped them, and the judgment to know which errors deserved a second ask at all.</p><p>Retries are the last thing you add, not the first. And they're only ever as safe as the four things underneath them.</p><div><hr></div><p>That's the theory and a first draft. The production version is in the paid series, Tuesdays and Thursdays: <strong>"Retries that don't take down the callee"</strong> (timeouts, exponential backoff with jitter, retry budgets, classifying retryable vs permanent, idempotency-aware retries) and <strong>"Circuit breakers, bulkheads &amp; backpressure"</strong> (the full closed/open/half-open state machine in TypeScript, bulkhead isolation, load shedding, queue + backpressure) &#8212; each with the prompt to generate it and the checklist to verify what the agent hands you.</p><p>Subscribe free for the Friday theory. Upgrade when you want the production code and the verification checklist.</p><p>And tell me in the comments: what's the smallest blip you've watched cascade into a full outage &#8212; and was it the retries that did it?</p>]]></content:encoded></item><item><title><![CDATA[Your Logs Are Lying to You]]></title><description><![CDATA[One request, five services, no correlation ID. You don't have observability. You have a pile of text you hope to grep.]]></description><link>https://geggleto.substack.com/p/your-logs-are-lying-to-you</link><guid isPermaLink="false">https://geggleto.substack.com/p/your-logs-are-lying-to-you</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Fri, 29 May 2026 23:58:44 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e4ca671f-65c3-4588-bdfc-520fb23d36fe_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>2:14am. A customer's checkout failed, and the on-call pager doesn't care that you were asleep. You open the dashboard: the API gateway logged the request, the order service logged something, the payment service logged something, inventory logged a warning, the notification worker logged an error. Five services. Five log streams. All of them have logs from 2:14am.</p><p>And not one of them tells you which lines belong to <em>this</em> request.</p><p>So you do the thing every engineer has done at 2am: you eyeball timestamps. You assume the gateway log at 02:14:06.221 and the payment error at 02:14:06.498 are the same request, because they're close and the story sort of fits. You're wrong, because at 2am there were forty other requests in that same 300ms window, and the error you're staring at belongs to a different user entirely. An hour later you've built a narrative out of coincidence, the incident is still burning, and your "observability stack" has told you nothing it didn't already know.</p><p>Here's the position I want to argue: per-service logs without a shared correlation ID aren't observability. They're text. You have a search box and a hope. The moment a request crosses a service boundary, you have lost the ability to reconstruct what happened to it, and no amount of log volume buys that ability back.</p><h2>A log line answers "what happened here." It can't answer "what happened to this request."</h2><p>The trap is that logging <em>feels</em> solved. Every service logs. The logs are searchable. You can grep. So when someone says "we need better observability," the instinct is "we already have logs."</p><p>But a log line is a local fact: <em>this service, at this moment, observed this.</em> That's genuinely useful when the request lives and dies inside one process. The instant it fans out (gateway calls order, order calls payment and inventory, inventory drops a message on a queue that a worker picks up later), the request becomes a <em>path</em> across processes, and no individual log line knows it's part of a path. Each line is a true sentence with no idea what paragraph it belongs to.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1Ip1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0da832b-ee0d-4724-a3e2-a08f5473686b_1600x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1Ip1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0da832b-ee0d-4724-a3e2-a08f5473686b_1600x1080.png 424w, https://substackcdn.com/image/fetch/$s_!1Ip1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0da832b-ee0d-4724-a3e2-a08f5473686b_1600x1080.png 848w, https://substackcdn.com/image/fetch/$s_!1Ip1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0da832b-ee0d-4724-a3e2-a08f5473686b_1600x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!1Ip1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0da832b-ee0d-4724-a3e2-a08f5473686b_1600x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1Ip1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0da832b-ee0d-4724-a3e2-a08f5473686b_1600x1080.png" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b0da832b-ee0d-4724-a3e2-a08f5473686b_1600x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Left panel: five services log in isolation with no shared key, unreconstructable; right panel: one trace ID threads every hop into a single ordered path.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Left panel: five services log in isolation with no shared key, unreconstructable; right panel: one trace ID threads every hop into a single ordered path." title="Left panel: five services log in isolation with no shared key, unreconstructable; right panel: one trace ID threads every hop into a single ordered path." srcset="https://substackcdn.com/image/fetch/$s_!1Ip1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0da832b-ee0d-4724-a3e2-a08f5473686b_1600x1080.png 424w, https://substackcdn.com/image/fetch/$s_!1Ip1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0da832b-ee0d-4724-a3e2-a08f5473686b_1600x1080.png 848w, https://substackcdn.com/image/fetch/$s_!1Ip1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0da832b-ee0d-4724-a3e2-a08f5473686b_1600x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!1Ip1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0da832b-ee0d-4724-a3e2-a08f5473686b_1600x1080.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The fix is almost insultingly simple to state and almost never done by default: mint one ID at the edge (the first hop the request touches) and thread it through every downstream call and every log line, in every service, for the entire life of the request. Now "what happened to this request" is a single query: <code>correlation_id = abc123</code>. The path reassembles itself. The five-service eyeballing exercise becomes one filter.</p><p>That ID is the load-bearing idea behind the whole observability stack. The three pillars people recite &#8212; logs, metrics, traces &#8212; aren't three views of the same thing; they answer different questions. Metrics tell you <em>something is wrong</em> (error rate is up, p99 spiked). Logs tell you <em>what specifically happened</em> at a point. Traces tell you <em>where in the path the time went and where it broke.</em> The correlation ID is the thread that lets a metric spike lead you to the exact traces, which lead you to the exact log lines. Without it, the three pillars are three disconnected dashboards you alt-tab between, narrating coincidence.</p><h2>While we're here: your log levels and your log format are also lying</h2><p>Two smaller lies ride along with the big one.</p><p>First, format. A log line like <code>Order 4471 failed for user 92 after 1200ms</code> is a sentence. To query it you write a regex, and the regex breaks the day someone rephrases the message. A <em>structured</em> log line, <code>{ "msg": "order failed", "order_id": 4471, "user_id": 92, "duration_ms": 1200, "correlation_id": "abc123" }</code>, is data. You filter on fields, you aggregate <code>duration_ms</code>, you join on <code>correlation_id</code>. Same information, except one of them you can ask questions of and the other you can only read.</p><p>Second, levels. In most codebases <code>info</code>, <code>warn</code>, and <code>error</code> have drifted into "how I felt about this line when I wrote it." If everything routine is <code>error</code> because it <em>looked</em> scary, then <code>error</code> means nothing, and at 2am you can't filter to the lines that actually matter. A level is a routing and alerting decision, not a mood. <code>error</code> should mean <em>a human needs to know.</em> If it doesn't mean that consistently, your most important filter is noise.</p><h2>The AI angle: ask an agent to "add logging" and watch it make this worse</h2><p>This is exactly the failure mode this series keeps circling. Ask Claude or Cursor to "add logging to this service" and you get a confident sprinkle of <code>console.log</code> and <code>logger.info</code> calls, each with the local context the agent could see: <code>logger.info('processing order', { orderId })</code>. In a single file, in a single service, it looks great. Reviews clean. Ships.</p><p>And it's useless the instant the request crosses a boundary, because the agent had no concept of the <em>request's whole journey</em>, only the function in front of it. There's no ID minted at the edge, nothing propagated to the downstream call, nothing tying this service's lines to the next service's lines. The agent optimized for "this function now logs," which is a local fact, when the actual requirement was "this request is reconstructable across five services," which is a system property. It generated a plausible answer to the wrong question, and it compiles, and it passes review, and you find out at 2am.</p><p>You don't fix that by asking it to "add more logging." You fix it by naming the system property up front.</p><h2>A starter prompt that names the right requirement</h2><p>Here's a prompt that points the agent at the correlation ID and the async boundary it will otherwise lose track of:</p><pre><code>Add a request correlation ID to this Node/TypeScript service.

Requirements:
- Generate a correlation ID at the edge (the first inbound hop). If the
  incoming request already carries one (e.g. an `x-correlation-id` header),
  reuse it instead of minting a new one.
- Store it in AsyncLocalStorage so any code in the request lifecycle can read
  it without passing it through every function signature.
- Automatically attach the correlation ID to every log line via the logger.
- Propagate it on every OUTGOING call this service makes (HTTP headers,
  outbound messages) so the next service inherits the same ID.

Then explain, in comments:
- How the ID survives an `await` (why AsyncLocalStorage and not a plain
  module-level variable).
- What happens when a request enters from a queue/worker instead of HTTP &#8212;
  where does the ID come from then, and what's the fallback if there isn't one.</code></pre><p><strong>What to verify:</strong> confirm the ID actually propagates <em>across an `await` and into the downstream HTTP/queue calls</em><code>await</code> and into the downstream HTTP/queue calls* (not just set once at the entry point and lost the moment you cross an async boundary), and that there's an explicit fallback when an inbound request arrives with no ID (queue messages, cron jobs, and replays often won't have one).</p><h2>"We have distributed tracing, isn't this solved?"</h2><p>Strongest objection, so let me steelman it: "We run OpenTelemetry. Spans propagate trace context. The correlation ID is just the trace ID. You're describing a solved problem."</p><p>If you've genuinely wired OTel context propagation through every hop <em>including</em> your async boundaries and your queue consumers, and your logs carry the trace ID, then yes, you've done the hard part. But "propagates correctly" is the entire game, and it's exactly where it quietly breaks: the worker that pulls from the queue starts a fresh context with no parent, the <code>setTimeout</code> callback runs outside the active context, the third-party client doesn't inject headers &#8212; and your traces fill with orphan spans you can't connect to the request that caused them. OTel is the right tool. It doesn't absolve you of understanding <em>why</em> the ID has to be minted at the edge and survive every boundary; it just gives you a heavier way to get it wrong if you don't.</p><h2>The inversion</h2><p>Stop measuring your observability by how much you log. Volume is the symptom of the problem, not the cure. More unstructured, uncorrelated lines is just a bigger pile to grep at 2am.</p><p>Measure it by one question instead: <em>can you take a single failed request and reconstruct its entire path across every service, in one query, without eyeballing a single timestamp?</em> If the answer is no, you don't have an observability gap. You have a correlation gap, and every log line you add until you close it is text you're paying to store and hoping you never have to read at 2am.</p><div><hr></div><p>That's the theory and a first draft. The production version &#8212; <code>AsyncLocalStorage</code> context propagation done so it survives every async boundary and queue hop, structured logging wired to the logger, OpenTelemetry instrumentation, log&#8596;trace correlation, plus the prompt to generate it and the checklist to verify what the agent hands you against the boundaries it loves to drop &#8212; is <strong>P4: Correlation IDs &amp; Distributed Tracing</strong> in the paid series, which runs Tuesdays and Thursdays.</p><p>Subscribe free for the Friday theory. Upgrade when you want the wiring and the verification checklist.</p><p>And tell me in the comments: what's the longest you've spent correlating an incident by eyeballing timestamps, and what would a single correlation ID have saved you?</p>]]></content:encoded></item><item><title><![CDATA[There Is No Transaction Across Two Services]]></title><description><![CDATA[ACID stops at the database boundary. A saga is how you fake atomicity across services &#8212; and the bill is intermediate states your users can see.]]></description><link>https://geggleto.substack.com/p/there-is-no-transaction-across-two</link><guid isPermaLink="false">https://geggleto.substack.com/p/there-is-no-transaction-across-two</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Fri, 29 May 2026 23:58:31 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/fce7332e-441f-4897-af35-63c695ee7c79_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>3:10am. PagerDuty. A user moved 5,000 points from their wallet to a partner gift card and the balances don't add up: the wallet shows the points gone, the gift-card service shows nothing arrived. Support has eleven of these from overnight. The on-call engineer pulls the code, expecting a missing <code>await</code> or a swallowed error. What they find is worse, because it looks correct.</p><pre><code>async function transferPoints(userId: string, amount: number) {
  try {
    await walletService.debit(userId, amount);   // succeeds
    await giftCardService.credit(userId, amount); // throws &#8212; partner 503
  } catch (err) {
    logger.error("transfer failed", err);
    throw err;
  }
}</code></pre><p>The debit committed. The credit threw. The <code>catch</code> ran, logged, re-threw, and did absolutely nothing about the 5,000 points that are now nowhere. There is no <code>ROLLBACK</code> here that reaches back across the network and un-debits the wallet, because <code>walletService</code> already committed its own local transaction the instant <code>debit</code> returned. The try/catch caught the exception. It could not catch the state.</p><p>Here's the position I want to argue: a database transaction's atomicity ends at the boundary of one database. The moment your operation spans two services (or two databases, even inside one service) you do not have a transaction anymore. You have a sequence of independent commits, and the only question is what you do when one of them lands and the next one doesn't.</p><h2>ACID is a property of one log, not your architecture</h2><p>Atomicity isn't magic. It's a single transaction log on a single database, where the engine can replay or discard a set of writes as one unit because it owns all of them. <code>walletService</code> has one of those. <code>giftCardService</code> has a different one. Neither can see the other's uncommitted state, neither can veto the other's commit, and nothing on earth makes their two logs agree to flip together.</p><p>So when the senior textbook answer comes up, <em>"use two-phase commit,"</em> here's why the industry quietly walked away from it. 2PC works by having a coordinator ask every participant to <em>prepare</em> (lock the rows, promise to commit), wait for all of them to say yes, then tell everyone to <em>commit</em>. The promise is atomicity across services. The price is brutal: every participant holds locks for the entire round trip, so one slow service stalls all of them. And the failure mode is the killer. If the coordinator dies <em>after</em> everyone prepared but <em>before</em> it says commit, every participant is stuck holding locks, blocked, waiting for a coordinator that isn't coming back. It doesn't scale, it couples your availability to your slowest node, and it turns one dead process into a system-wide freeze. That's why your message broker, your payment processor, and your favorite cloud database all decline to offer it.</p><h2>The saga: fake the atomicity, own the in-between</h2><p>The pattern that replaced 2PC is the <strong>saga</strong>: model the operation as a sequence of <em>local</em> transactions, one per service, each of which commits independently and immediately. No distributed locks, no coordinator holding everyone hostage. The catch, and it is the whole point, is that every forward step must come with a <strong>compensating action</strong>: a second local transaction that semantically undoes it.</p><p>Debit the wallet. Then credit the gift card. If the credit fails, you don't roll back. You can't. You run the <em>compensation</em> for the debit: credit the wallet back. The system passes through an inconsistent state (points debited, not yet credited) and then claws its way back to a consistent one. That's the trade. You give up "the in-between never exists" and you buy "the in-between is bounded and recoverable."</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ep5V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f2d37cb-6805-4d4d-a5d8-a2086d107b60_1600x1060.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ep5V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f2d37cb-6805-4d4d-a5d8-a2086d107b60_1600x1060.png 424w, https://substackcdn.com/image/fetch/$s_!ep5V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f2d37cb-6805-4d4d-a5d8-a2086d107b60_1600x1060.png 848w, https://substackcdn.com/image/fetch/$s_!ep5V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f2d37cb-6805-4d4d-a5d8-a2086d107b60_1600x1060.png 1272w, https://substackcdn.com/image/fetch/$s_!ep5V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f2d37cb-6805-4d4d-a5d8-a2086d107b60_1600x1060.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ep5V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f2d37cb-6805-4d4d-a5d8-a2086d107b60_1600x1060.png" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f2d37cb-6805-4d4d-a5d8-a2086d107b60_1600x1060.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Two-row saga diagram: the happy path commits reserve inventory, charge card, then create shipment left to right; the failure path shows charge card failing so a compensating action runs backward to release the already-reserved inventory.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Two-row saga diagram: the happy path commits reserve inventory, charge card, then create shipment left to right; the failure path shows charge card failing so a compensating action runs backward to release the already-reserved inventory." title="Two-row saga diagram: the happy path commits reserve inventory, charge card, then create shipment left to right; the failure path shows charge card failing so a compensating action runs backward to release the already-reserved inventory." srcset="https://substackcdn.com/image/fetch/$s_!ep5V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f2d37cb-6805-4d4d-a5d8-a2086d107b60_1600x1060.png 424w, https://substackcdn.com/image/fetch/$s_!ep5V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f2d37cb-6805-4d4d-a5d8-a2086d107b60_1600x1060.png 848w, https://substackcdn.com/image/fetch/$s_!ep5V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f2d37cb-6805-4d4d-a5d8-a2086d107b60_1600x1060.png 1272w, https://substackcdn.com/image/fetch/$s_!ep5V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f2d37cb-6805-4d4d-a5d8-a2086d107b60_1600x1060.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Two ways to drive it. <strong>Orchestration</strong>: a coordinator owns the sequence. Call step 1, on success call step 2, on failure walk back through the compensations. The flow lives in one place you can read. <strong>Choreography</strong>: no coordinator; each service emits an event, the next service reacts to it, and compensations are themselves events others react to. Choreography removes the central component but scatters the flow across N services' event handlers, and "what's the current state of this transfer?" becomes a forensic exercise. For anything you'll have to debug at 3am, start with orchestration. You can read the whole story in one file.</p><h2>"Fine &#8212; I'll just wrap it in a try/catch and undo on failure"</h2><p>This is the objection worth taking seriously, because it's <em>almost</em> the saga, and it's exactly what an agent hands you when you ask for one. Steelman it: on the error path, call the reverse operation. Isn't that compensation?</p><p>It's the happy-path drawing of compensation with all the hard parts erased. Two questions break it. First: <strong>what is observable while you're in-between?</strong> Between the debit and the credit-back, a user can open the app and see points that vanished into the void. Your support queue is made of that window. A real saga names those states and decides what the user sees in each one: "transfer pending," not a silently wrong balance.</p><p>Second, and this is the one that ends the argument: <strong>what happens when the compensation itself fails?</strong> The gift-card credit threw because the partner returned 503. Now you call <code>walletService.credit</code> to compensate, and <em>that</em> throws too, because the wallet service is mid-deploy. Your try/catch has no catch for its own catch. The points are now gone with no record that a recovery was ever owed. A real saga treats compensation as a first-class, <em>retryable</em>, durably-recorded step &#8212; because the compensation is the part most likely to run during the exact incident that caused the failure. A try/catch has nowhere to put that. That's the tell that you're looking at a fake saga, not a real one.</p><h2>The starter prompt</h2><p>You don't ask an agent for "a transaction across two services." There isn't one, and it'll cheerfully write you the try/catch above and call it done. You ask it to model the saga, and to confess the parts it likes to skip:</p><pre><code>I have a multi-service operation: &lt;describe each step and which service
owns it, e.g. walletService.debit then giftCardService.credit&gt;.

Model this as a saga using orchestration (a single coordinator drives the
steps). For EACH forward step, define its compensating transaction &#8212; the
local operation that semantically undoes it. Then, before any code:

1. List every intermediate state that is observable to a user (e.g. money
   debited but not yet credited) and what the user should see in each.
2. Specify what happens if a COMPENSATION itself fails &#8212; it cannot just be
   a try/catch. How is it retried and where is the in-flight state recorded?

Do not write the production implementation yet. Give me the step/compensation
map and the two lists first.</code></pre><p><strong>What to verify:</strong> every forward step has a named compensation (no orphans), and there is a real answer for a failed compensation: durable record plus a retry path, not a <code>catch</code> that logs and re-throws. If the agent's compensation story is "wrap it in try/catch," you got the fake saga. Send it back.</p><div><hr></div><p>That's the theory and a first map. The production version is the paid series, Tuesdays and Thursdays: a typed saga orchestrator with compensations that retry safely, how to record in-flight state so a failed compensation survives a restart (P5, <em>Building a saga orchestrator</em>), and how to publish those step events reliably without falling back into 2PC via the transactional outbox (P7, <em>The outbox pattern</em>). It ships with the prompt to generate the orchestrator and the checklist to verify what the agent gives you against every failure mode named here.</p><p>Subscribe free for the Friday theory. Upgrade when you want the orchestrator and the verification checklist.</p><p>And tell me in the comments: what's the worst limbo state a "transaction" has ever left in your system &#8212; money, points, or inventory stuck between two services that each thought the other had it?</p>]]></content:encoded></item><item><title><![CDATA[Your Mutex Doesn't Cross the Network]]></title><description><![CDATA[A single-process lock is meaningless on two servers, and the Redis lock you copied from a blog hands the same lock to two holders. Safety vs&#8230;]]></description><link>https://geggleto.substack.com/p/your-mutex-doesnt-cross-the-network</link><guid isPermaLink="false">https://geggleto.substack.com/p/your-mutex-doesnt-cross-the-network</guid><dc:creator><![CDATA[Glenn Eggleton]]></dc:creator><pubDate>Fri, 29 May 2026 23:58:14 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/6d80b1c6-aad9-4da7-8e56-f811288b8280_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>2:00am, the first of the month. The "send invoices" cron fires. It always has. Except the service runs two pods now (somebody scaled it for headroom three sprints ago) and tonight both pods wake at the same instant and reach for the same job.</p><p>That's fine, you think, there's a lock. There is. It's the Redis lock from the top Stack Overflow answer, the one everybody copies. Pod A calls <code>SET invoice-run NX</code>, wins, starts billing. Pod B loses, backs off, goes quiet. Working as designed.</p><p>Then pod A hits a stop-the-world GC pause. Not a crash. A pause. For 1.2 seconds the process is frozen mid-batch, holding a lock with a 30-second expiry it set ages ago. While it's frozen, the clock keeps running, the lease expires, Redis drops the key. Pod B's retry fires, sees an empty key, acquires the lock, and starts sending invoices from where it thinks the work begins. Pod A thaws, has no idea any time passed, and keeps right on billing from where <em>it</em> left off. Two holders. Same job. Customers double-billed.</p><p>Pull the logs and the lock <em>worked</em> the whole time. Both pods called <code>SET NX</code> correctly. Redis behaved correctly. Nobody crashed. The lock did exactly what it was written to do, and it still let two holders run at once.</p><p>Here's the position I want to argue: a lock that only enforces "you got the key" is not a distributed lock. It's a suggestion. On one machine your mutex is backed by the operating system and it cannot lie to you. The moment coordination crosses the network, the lock stops being a guarantee about <em>the world</em> and becomes a stale belief living inside whichever process grabbed it last. And a belief can be wrong. The only thing that makes a distributed lock actually safe is a <strong>fencing token</strong>: proof, checked at the resource, of <em>who holds the lock right now</em>, not who held it when they last looked.</p><h2>A mutex is a fact about one process. It doesn't survive the network.</h2><p>When you <code>await mutex.acquire()</code> inside one Node process, you are leaning on something real: a shared piece of memory the runtime owns, that every thread and task in that process can see, that updates atomically. There is exactly one source of truth and everyone consults the same copy. That's why it works.</p><p>Now run that process twice: two pods, two servers, two regions. Pod A's mutex coordinates the tasks <em>inside pod A</em>. Pod B's mutex coordinates the tasks <em>inside pod B</em>. Neither one can see the other. They aren't the same lock with two clients; they are two completely separate locks that happen to share a variable name. Pod A acquiring its in-memory mutex tells pod B precisely nothing, because there is no shared memory between them to tell it <em>through</em>.</p><p>So the first truth is the blunt one: an in-process mutex on two machines isn't a weak lock. It's not a lock at all. To coordinate across processes you need a source of truth that lives <em>outside</em> every process, somewhere all of them can see. That's the whole reason Redis shows up. Not because Redis is magic. Because it's the shared variable two pods can both read.</p><h2>The naive lock has two failure modes, and they're a trap</h2><p>Move the lock to Redis and the obvious first version is <code>SET lockkey ownerid NX</code>: set the key only if it doesn't exist. First caller wins, others see the key and back off. To release, you <code>DEL lockkey</code>.</p><p>Failure one: no expiry. Pod A acquires the lock and then the pod is OOM-killed mid-batch: no graceful shutdown, no <code>DEL</code>. The key sits in Redis forever. Every future run sees it, backs off, and the invoice job never runs again. You didn't get a double-charge; you got a job that's wedged until a human notices and deletes a key by hand. That's a <strong>liveness</strong> failure: the lock is held by a holder that no longer exists, and nothing will ever release it.</p><p>So you add an expiry: <code>SET lockkey ownerid NX PX 30000</code>. Now a dead holder's lock self-heals after 30 seconds. Liveness restored. And you just bought failure two, which is worse, because it's a <strong>safety</strong> failure and it's silent.</p><p>The expiry doesn't ask whether the holder is <em>done</em>. It asks whether 30 seconds elapsed. Those are different questions. A holder that GC-pauses, or blocks on a slow downstream, or gets descheduled by the kernel, is neither done nor dead. It's <em>slow</em>. But the lease expires on schedule anyway, Redis hands the lock to the next caller, and now two processes both sincerely believe they hold it. That's the cold open. The expiry you added to fix the deadlock is exactly the thing that created the double-billing.</p><p>And the <code>DEL</code> on release has its own quiet knife in it. Pod A's lease already expired and pod B already acquired. Then pod A wakes and runs its release, a plain <code>DEL lockkey</code>. It just deleted <em>pod B's</em> lock. Pod B is still working and now holds nothing; pod C can acquire on top of it. A release that doesn't check ownership doesn't release your lock. It releases whoever's lock happens to be there.</p><h2>Safety vs liveness &#8212; and why you can't have both for free</h2><p>Two properties, pulling against each other:</p><ul><li><p><strong>Safety</strong>: two holders never run at once. (Nothing bad happens.)</p></li><li><p><strong>Liveness</strong>: the lock is always eventually released, even if a holder dies. (Something good eventually happens.)</p></li></ul><p>No expiry is perfectly safe and has no liveness: a dead holder wedges it forever. A short expiry has great liveness and breaks safety, because slow holders get overrun. There is no expiry value that gives you both, because you cannot tell "the holder crashed" apart from "the holder is just slow" by watching a clock. From the outside they look identical, which is the same impossibility that makes exactly-once delivery a fantasy: a timeout can't distinguish dead from slow.</p><p>The lease (expiry) is how you buy liveness. You pay for it in safety. The bill comes due the first time a holder pauses past its lease, which, with a GC'd runtime and a busy scheduler, is not an "if."</p><h2>The fencing token &#8212; the part everyone omits</h2><p>Here's the move the Stack Overflow answer leaves out, and the one thing that actually makes the lock safe.</p><p>Every time the lock is granted, the lock service hands out a number that only ever goes <strong>up</strong>: a fencing token. Holder A acquires and gets token 33. A pauses, the lease expires, holder B acquires and gets token 34, strictly higher, because the counter only increments. The trick is what you do with that number: A and B don't just <em>hold</em> the token, they <strong>present it to the protected resource on every write</strong>, and the resource remembers the highest token it has ever accepted and rejects anything lower.</p><p>So A thaws, resumes, and writes "send these invoices, token 33." But the database (or the invoice API, or whatever sits at the end) has already accepted a write stamped 34 from B. It sees 33, knows 33 is stale, and refuses it. A's late write bounces. There is still a window where two processes <em>think</em> they hold the lock (you cannot close that window with a clock) but only one of them can successfully <em>act</em>, because the resource is the referee and it only honors the newest token.</p><p>That's the conceptual leap. A lock without fencing protects the <em>acquisition</em>. A fence protects the <em>resource</em>. The lock can be wrong about who holds it; the fence can't be tricked, because being the holder isn't a claim you make. It's a number the resource checks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aXWm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0bcdd2-11a3-4a6c-aef1-17b79cc2c648_1600x1100.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aXWm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0bcdd2-11a3-4a6c-aef1-17b79cc2c648_1600x1100.png 424w, https://substackcdn.com/image/fetch/$s_!aXWm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0bcdd2-11a3-4a6c-aef1-17b79cc2c648_1600x1100.png 848w, https://substackcdn.com/image/fetch/$s_!aXWm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0bcdd2-11a3-4a6c-aef1-17b79cc2c648_1600x1100.png 1272w, https://substackcdn.com/image/fetch/$s_!aXWm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0bcdd2-11a3-4a6c-aef1-17b79cc2c648_1600x1100.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aXWm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0bcdd2-11a3-4a6c-aef1-17b79cc2c648_1600x1100.png" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f0bcdd2-11a3-4a6c-aef1-17b79cc2c648_1600x1100.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Timeline: holder A acquires a lease with fencing token 33, then suffers a GC pause; the lease expires and holder B acquires it with token 34; when A wakes and writes with its stale token 33, the resource &#8212; which has recorded highest token seen 34 &#8212; accepts B but rejects A.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Timeline: holder A acquires a lease with fencing token 33, then suffers a GC pause; the lease expires and holder B acquires it with token 34; when A wakes and writes with its stale token 33, the resource &#8212; which has recorded highest token seen 34 &#8212; accepts B but rejects A." title="Timeline: holder A acquires a lease with fencing token 33, then suffers a GC pause; the lease expires and holder B acquires it with token 34; when A wakes and writes with its stale token 33, the resource &#8212; which has recorded highest token seen 34 &#8212; accepts B but rejects A." srcset="https://substackcdn.com/image/fetch/$s_!aXWm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0bcdd2-11a3-4a6c-aef1-17b79cc2c648_1600x1100.png 424w, https://substackcdn.com/image/fetch/$s_!aXWm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0bcdd2-11a3-4a6c-aef1-17b79cc2c648_1600x1100.png 848w, https://substackcdn.com/image/fetch/$s_!aXWm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0bcdd2-11a3-4a6c-aef1-17b79cc2c648_1600x1100.png 1272w, https://substackcdn.com/image/fetch/$s_!aXWm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0bcdd2-11a3-4a6c-aef1-17b79cc2c648_1600x1100.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The agent writes the broken canonical version</h2><p>Ask an agent for "a distributed lock with Redis" and you get the answer that's all over the internet, because that's what it learned from. <code>SET key value NX</code> to acquire, and then one of the two traps, depending on which blog it averaged. Either no expiry, so a crashed holder deadlocks the lock forever. Or an expiry with no fencing token, so a slow holder gets overrun and you get two holders. And a release that's a plain <code>DEL key</code>, which cheerfully deletes another holder's lock if yours already expired.</p><p>It will look right. It's idiomatic, it's short, it matches the top search result, and the happy-path test (one client acquires, does work, releases) goes green. It is the broken one. The failure isn't in the code you can see; it's in the GC pause and the crashed pod that no unit test in the suite simulates. The agent optimized for "a Redis lock," and "a Redis lock" on the internet is the unsafe one.</p><h2>A starter prompt &#8212; and the one thing to verify</h2><p>Here's a prompt that gets you a real first draft instead of the double-biller:</p><pre><code>Implement a distributed lock in TypeScript/Node backed by Redis.

Requirements:
- Acquire with SET NX and a lease (expiry / PX), so a crashed holder's lock
  self-heals &#8212; never a permanent deadlock.
- On acquire, hand the caller a FENCING TOKEN: a monotonically increasing
  number (e.g. INCR a counter), returned alongside the lock.
- The protected resource must REQUIRE that token on every write and reject any
  token lower than the highest it has already accepted. Show that check at the
  resource, not just stored in the lock.
- Release must be a COMPARE-AND-DELETE: only delete the key if this caller still
  owns it (check the owner value), done atomically so you can't delete another
  holder's lock. Use a Lua script or WATCH/MULTI &#8212; explain why a GET-then-DEL is
  not atomic.
- Then list the failure modes: crashed holder, holder that PAUSES PAST THE LEASE
  (GC / slow downstream), and a release arriving after expiry. State what happens
  in each, and which property (safety or liveness) is at risk.</code></pre><p><strong>What to verify:</strong> three things, and the agent will usually miss at least one. First, the fencing token is actually <em>handed to the protected resource and checked there</em> &#8212; not merely stored inside the lock, which protects nothing. Second, release is an <strong>atomic compare-and-delete</strong> (Lua or WATCH/MULTI) that will not delete another holder's lock. A <code>GET</code> then <code>DEL</code> has a race between the two calls and is wrong. Third, there's a real answer for <em>"the holder pauses past the lease"</em>: if the prompt's reply waves at it or assumes holders never pause, it hasn't understood the actual failure, and that's the exact one that double-bills you.</p><h2>The senior move: maybe you don't need a lock at all</h2><p>Last inversion, and it's the one I'd lead a design review with. The fact that distributed locks are this hard is itself a signal: a lock is a heavy, leaky way to get safety, and a lot of the time you're reaching for it to paper over a problem that has a cleaner shape.</p><p>Two alternatives beat a lock more often than people admit. <strong>Idempotency</strong>: make the protected operation safe to run twice, and you stop caring whether two holders fire, because the second run is a no-op. <strong>Optimistic concurrency</strong>: don't gate access at all; let everyone try, attach a version to the write (a conditional update, a compare-and-swap, a <code>WHERE version = N</code>), and let the database reject the loser. Notice that's the same idea as a fencing token, pushed all the way down into the data layer: the resource decides, not the lock. When the work is naturally idempotent or expressible as a conditional write, a lock is the wrong tool. You're adding a coordination service, a lease, and a fencing scheme to enforce something the database could enforce in one atomic write.</p><p>A lock is what you reach for when you genuinely need <em>mutual exclusion across machines</em> and the operation can't be made idempotent or conditional. That's a real case. It is just rarer than the number of locks in production would suggest.</p><p>(There's also the <strong>Redlock</strong> debate: antirez's multi-node Redis lock vs Kleppmann's critique that no Redis-only scheme is safe without fencing. I'm not going to settle a famous argument in a free post. I'll say the part both sides agree on: if your correctness depends on the lock alone, you've already lost. The fence at the resource is what saves you. Which lock you run on top is the second question, and it's a paid one.)</p><div><hr></div><p>That's the theory and a starter prompt &#8212; enough to know what <em>right</em> looks like and to get a first draft that won't deadlock on a crash. The production version is in the paid series (Tuesdays and Thursdays): <strong>"Distributed locks for real"</strong>, the Redis lease lock with fencing tokens wired end to end, the Redlock tradeoffs laid out honestly, the optimistic-concurrency alternative in full, and the lock-vs-idempotency decision so you reach for the right one. Plus the prompt to generate it and the checklist to verify what the agent hands you against every failure mode this post named.</p><p>Subscribe free for the Friday theory. Upgrade when you want the lock that survives the pause.</p><p>And tell me in the comments: what's the worst thing two holders ever did to your system at once &#8212; and did the lock "work" the whole time it was failing?</p>]]></content:encoded></item></channel></rss>