<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://www.dylanamartin.com/blog.xml" rel="self" type="application/atom+xml" /><link href="https://www.dylanamartin.com/" rel="alternate" type="text/html" /><updated>2026-06-08T23:10:23+00:00</updated><id>https://www.dylanamartin.com/blog.xml</id><title type="html">Dylan Martin</title><subtitle>Compacted Context is the personal website of Dylan Martin, where I publish essays about software engineering, career reflections, or whatever else I&apos;m thinking about, and digests of what I&apos;m reading (or occasionally watching).</subtitle><author><name>Dylan</name></author><entry><title type="html">On Hormuz (and concretely leveraging geography)</title><link href="https://www.dylanamartin.com/2026/04/10/on-hormuz.html" rel="alternate" type="text/html" title="On Hormuz (and concretely leveraging geography)" /><published>2026-04-10T00:00:00+00:00</published><updated>2026-04-10T00:00:00+00:00</updated><id>https://www.dylanamartin.com/2026/04/10/on-hormuz</id><content type="html" xml:base="https://www.dylanamartin.com/2026/04/10/on-hormuz.html"><![CDATA[<p><em>I spent an evening going down a macro rabbit hole after reading <a href="https://www.citriniresearch.com/p/free-strait-of-hormuz-a-citrini-field">Citrini Research’s field report</a> on the Strait of Hormuz. One of their analysts was physically on the water near Oman, watching tankers transit through Iran’s checkpoint system, talking to fishermen and shipping captains. So sick. I texted some thoughts I’d had about it to my friend <a href="https://cameron.otsuka.systems/">Cameron</a> the next morning and our subsequence conversation sharpened a few of these ideas considerably. This post came out of that exchange.</em></p>

<p>I went in expecting to learn about oil prices and trade baskets. I came away thinking about energy and geography. For thirty-odd years the assumption has been that where you sit on the map is subordinate to economics and US naval hegemony. Open seas, free trade, plug into the global system and your coordinates don’t matter much. I think Hormuz is showing us that assumption was wrong – and the investment baskets, the proxy wars, the ceasefire negotiations all make more sense when you look at them through that lens. This isn’t a new argument – Kaplan’s <em>Revenge of Geography</em> made the case in 2012, Zeihan’s been running the demographic/geographic thesis for a decade. What feels novel is watching the mechanism work in real time – Iran didn’t just leverage geography abstractly; they built a literal toll booth. The ships are transiting now, under a dozen flags, negotiating with geography directly. That’s what this post is about.</p>

<h2 id="but-first-the-baskets">But first, the baskets</h2>

<p>Citrini published three trade baskets around the Middle East disruption. A “trade basket” is basically a curated list of stocks that express a thesis – if you believe X is going to happen, buy these companies. The percentages next to each name tell you how much of your money goes into each one.</p>

<p>If you’re a finance person and already grokked all the necessary info on this stuff from the original post, feel free to skip this section, but as someone not super long on this stuff, I found that these baskets were a useful framework for the non Bloomberg terminal-pilled. Each one captures a different theory about how geopolitical chaos flows through markets.</p>

<p><strong>Tanker basket.</strong> These are companies that own and operate the massive ships that move crude oil around the world. Frontline, DHT, Scorpio, a bunch of others. The biggest position is BWET, an ETF that tracks tanker freight futures – essentially a direct bet on shipping rates going up. When shipping lanes get dangerous, ships reroute around Africa instead of through Suez. Same oil, way more ships needed, way more days at sea. Rates spike. BWET has quadrupled since Citrini first recommended it.</p>

<p><img src="../../../media/basket1.webp" alt="Tanker Basket" width="740px" /></p>

<p><strong>Energy/commodities basket.</strong> Broader oil supply chain: companies that drill, refine, run pipelines, make fertilizer (natural gas is the main input for fertilizer, so energy disruption hits them hard). Oil went from $67 on February 27 to around $118 at peak. This basket is a straightforward “disruption means higher energy prices” bet. Citrini’s highest conviction play here is actually US petrochemicals – Dow and Westlake – because Gulf infrastructure damage will take 250-275 days to repair even after Hormuz reopens. US producers sitting on cheap domestic feedstock benefit regardless of how the conflict resolves.</p>

<p><img src="../../../media/basket2.webp" alt="Energy/Commodities Basket" width="740px" /></p>

<p><strong>Energy sovereignty basket.</strong> This one’s different. It’s almost entirely European and Japanese renewables companies – Vestas, Orsted, Iberdrola, RWE, plus grid infrastructure and solar hardware manufacturers. The bet is that every government watching Hormuz close now has political mandate to fast-track domestic energy buildout. Permits, subsidies, grid investment. The utilities in this basket (Fortum, Verbund, Acciona) have a particularly elegant setup: they own renewable assets with near-zero marginal cost but sell into wholesale electricity markets where the price is set by gas. When gas prices spike, their margins explode even though their costs didn’t change. Slower burn than the other two baskets, less volatile, probably only up modestly so far. But arguably the most durable thesis.</p>

<p><img src="../../../media/basket3.webp" alt="Energy Sovereignty Basket" width="740px" /></p>

<p>The ceasefire announced yesterday hammered all three baskets, at least short term.</p>

<h2 id="the-toll-booth">The toll booth</h2>

<p>The detail from Citrini’s field report that stuck with me most is that Iran hasn’t actually closed the strait. They’ve set up a checkpoint system through the Qeshm-Larak channel. Ship owners submit vessel details – ownership, flag, cargo, crew – to brokers. Payments go through cash, crypto, or diplomatic arrangements. Approved vessels get escorted through Iranian waters. The analyst observed at least 15 ships crossing on April 2, up from 2-5 daily in prior weeks. Ships flying Indian, Malaysian, Japanese, Greek, French, Chinese, Turkish, and Omani flags. All transiting independently. Without Washington’s approval.</p>

<p>Iran has turned a chokepoint into a toll booth. The Strait of Hormuz is 21 miles wide at its narrowest point, 20% of global oil transits through it<sup id="fnref:eia-hormuz" role="doc-noteref"><a href="#fn:eia-hormuz" class="footnote" rel="footnote">1</a></sup>, and Iran sits on the northern shore. That’s leverage you can’t sanction away and can’t replicate somewhere else. And the place you can see the old order breaking down most clearly is in who’s transiting right now and how they’re doing it – independently, on their own terms, without asking Washington.</p>

<p>Cameron framed the broader picture well: Iran and Ukraine are the pre-WWI/II proxy fights, the ones where the major powers try to settle the new order without going at it directly. You can already see it in the mediation patterns. Pakistan popping up as a “surprise” mediator in the US-Iran talks makes a lot more sense when you realize China sent them.</p>

<h2 id="geography-starts-to-matter">Geography starts to matter</h2>

<p>Cameron and I’s conversation covered a lot of ground, but three regions kept coming up – each one illustrating a different way that physical location is reasserting itself over financial abstraction.</p>

<p><strong>Iran.</strong> If sanctions get dropped as part of the ceasefire framework, 1.5-2 million barrels per day of suppressed supply re-enter the market. That’s an oil price story, sure. But Iran has been selling heavily discounted crude to China and India through shadow fleets for years because they have no other buyers. Sanctions gone means normal banking, normal pricing, normal trade relationships. Give Iran access to SWIFT and suddenly Beijing isn’t the only trading partner worth having. The toll booth at Hormuz gave Iran leverage; sanctions relief would give them options.</p>

<p><strong>Cuba.</strong> Alfred Thayer Mahan called Cuba “the key to the Gulf of Mexico” the same way Gibraltar is the key to the Mediterranean<sup id="fnref:mahan" role="doc-noteref"><a href="#fn:mahan" class="footnote" rel="footnote">2</a></sup>, and looking at a map you can see why. Cuba flanks both the Yucatan Channel (connecting the Gulf to the Caribbean) and the Straits of Florida (connecting the Gulf to the Atlantic). Any vessel entering or leaving the Gulf passes within range of Cuban shores. That includes the shipping lanes carrying roughly 13% of total US crude oil production from Gulf platforms to refineries and markets.<sup id="fnref:eia-gulf" role="doc-noteref"><a href="#fn:eia-gulf" class="footnote" rel="footnote">3</a></sup></p>

<p>This matters right now because Russia and China are actively building presence there. In June 2024, Russian warships carrying hypersonic missiles docked in Havana – the largest Russian show of force with Cuba in years.<sup id="fnref:russia-cuba" role="doc-noteref"><a href="#fn:russia-cuba" class="footnote" rel="footnote">4</a></sup> Russia ratified a formal military cooperation agreement with Cuba in October 2025.<sup id="fnref:russia-pact" role="doc-noteref"><a href="#fn:russia-pact" class="footnote" rel="footnote">5</a></sup> Meanwhile, CSIS has identified four Chinese signals intelligence facilities on the island through satellite imagery, including one at Bejucal less than 100 miles from Florida that can monitor Kennedy Space Center launches.<sup id="fnref:csis-cuba" role="doc-noteref"><a href="#fn:csis-cuba" class="footnote" rel="footnote">6</a></sup> In March 2026, a Russian oil tanker broke the US fuel blockade with 100,000 tonnes of crude.<sup id="fnref:cuba-tanker" role="doc-noteref"><a href="#fn:cuba-tanker" class="footnote" rel="footnote">7</a></sup></p>

<p>In short: Cuba caps the Gulf for the US and gets Russia out of our hemisphere. That’s a Monroe Doctrine argument. The economics of Cuba – nearshoring, tourism, whatever – are hypothetical and frankly unrealistic under current conditions. Trump is running maximum pressure, the island’s population has dropped over 10% in four years from emigration<sup id="fnref:cuba-emigration" role="doc-noteref"><a href="#fn:cuba-emigration" class="footnote" rel="footnote">8</a></sup>, the power grid collapsed in March.<sup id="fnref:cuba-crisis" role="doc-noteref"><a href="#fn:cuba-crisis" class="footnote" rel="footnote">9</a></sup> None of that changes where Cuba is on a map. It controls chokepoints the same way Iran does, and other powers are already treating it accordingly.</p>

<p><strong>The EU.</strong> The EU-as-superpower scenario is probably the most underpriced thing in markets right now. Common defense spending, real industrial policy, joint debt issuance – if it happens you get a completely different European investment surface. But the reason it’s underpriced is because it’s genuinely hard. The Franco-German engine that’s supposed to drive consolidation is stalled: their EUR 100B+ joint fighter jet program (FCAS) is collapsing over workshare disputes<sup id="fnref:fcas" role="doc-noteref"><a href="#fn:fcas" class="footnote" rel="footnote">10</a></sup>, Macron is pushing for common EU debt and Germany is explicitly rejecting it<sup id="fnref:eu-debt" role="doc-noteref"><a href="#fn:eu-debt" class="footnote" rel="footnote">11</a></sup>, and their competing visions for industrial policy are pulling other member states into opposing camps.</p>

<p>The energy picture is a useful reality check. Europe has spent EUR 300 billion on REPowerEU<sup id="fnref:repowereu" role="doc-noteref"><a href="#fn:repowereu" class="footnote" rel="footnote">12</a></sup>, cut Russian gas imports from 45% to 13%<sup id="fnref:eu-gas" role="doc-noteref"><a href="#fn:eu-gas" class="footnote" rel="footnote">13</a></sup>, tripled US LNG imports<sup id="fnref:eu-gas:1" role="doc-noteref"><a href="#fn:eu-gas" class="footnote" rel="footnote">13</a></sup>, and hit a milestone in 2025 where wind and solar generated more EU electricity than fossil fuels for the first time.<sup id="fnref:ember" role="doc-noteref"><a href="#fn:ember" class="footnote" rel="footnote">14</a></sup> And despite all that, the Hormuz closure – which directly affects only about 10% of EU LNG<sup id="fnref:ieefa" role="doc-noteref"><a href="#fn:ieefa" class="footnote" rel="footnote">15</a></sup> – still sent Dutch gas benchmarks nearly doubling, triggered fuel rationing in Slovenia<sup id="fnref:rationing" role="doc-noteref"><a href="#fn:rationing" class="footnote" rel="footnote">16</a></sup>, and forced the ECB to postpone rate cuts. Geography doesn’t care about your spending programs.</p>

<p>But I think the most interesting take on this whole geography thing has to do with trade routes and climate. Historically, European power flows East-West through the Mediterranean. Spain, Italy, Greece – they control the sea lanes, which makes them more strategically important within the EU than their GDP might suggest. But global warming means the Arctic is opening up. China already runs an “Arctic Express” container service from Shanghai to Rotterdam, Hamburg, Gdansk, and Felixstowe – 18 days versus 35-50 via Suez.<sup id="fnref:arctic-express" role="doc-noteref"><a href="#fn:arctic-express" class="footnote" rel="footnote">17</a></sup> Nature Communications projects the first ice-free Arctic day before 2030.<sup id="fnref:arctic-ice" role="doc-noteref"><a href="#fn:arctic-ice" class="footnote" rel="footnote">18</a></sup> As the northern passages become viable, the UK, the Netherlands, and Norway gain strategic weight as trade destinations – potentially at Mediterranean expense. (The Arctic angle probably deserves its own post, but it’s worth flagging here because it illustrates the same point from a different direction.)</p>

<p>Italy sees it coming – they’re counter-investing in Trieste as a rail gateway to Central Europe, trying to stay relevant in a world where ships might skip the Mediterranean entirely. The geography of Europe is pulling in two directions at once, and climate change is accelerating the split. That tension makes a unified EU bloc harder to achieve in ways that don’t show up in the usual federalist pitch.</p>

<h2 id="bits-still-need-atoms">Bits still need atoms</h2>

<p>Here’s where I’ll bring it back to what I actually know something about: cloud software and AI.</p>

<p>The popular version of the AI story is that software is weightless, models run in the cloud, bits move at the speed of light. Geography fades into irrelevance. But the infrastructure underneath all of that is profoundly physical, and it’s getting more physical every year. Dylan Patel at SemiAnalysis <a href="https://www.dwarkesh.com/p/dylan-patel">estimates</a> that the big four hyperscalers will deploy roughly 50 gigawatts of compute capacity this year, backed by around $600 billion in CapEx. Every one of those gigawatts needs power from a physical place. Google is buying energy companies. Hyperscalers are putting deposits down on turbines and locking in power purchasing agreements years in advance. The constraint on AI scaling is increasingly “where can you plug this in” – and that’s a geography question.</p>

<p>The supply chain is even more concentrated. ASML in the Netherlands makes roughly 70 EUV lithography machines per year – the tools you need to fabricate leading-edge AI chips. Patel calculates that you need about three and a half EUV tools per gigawatt of AI compute. That puts a hard ceiling on total global capacity: maybe 200 gigawatts by 2030, and that’s if everything goes right. TSMC in Taiwan fabricates the chips. Samsung and SK Hynix in South Korea make the high-bandwidth memory. The entire AI buildout runs through a handful of specific facilities in a handful of specific countries, several of which sit in the same part of the Pacific that’s become the other major theater of great power competition.</p>

<p>Every gigawatt of new compute deepens the dependency on specific places – where the power is, where the fabs are, where the lithography machines come from. AI only makes geography more important.</p>

<p>The ships transiting Hormuz under a dozen different flags aren’t waiting for a diplomatic resolution. They’re negotiating directly with geography. I suspect the next decade will be defined by who else figures that out.</p>

<hr />

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:eia-hormuz" role="doc-endnote">
      <p>U.S. Energy Information Administration, <a href="https://www.eia.gov/todayinenergy/detail.php?id=39932">“The Strait of Hormuz is the world’s most important oil transit chokepoint.”</a> <a href="#fnref:eia-hormuz" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:mahan" role="doc-endnote">
      <p>Alfred Thayer Mahan, via USNI Proceedings, <a href="https://www.usni.org/magazines/proceedings/1962/december/cubas-place-u-s-naval-strategy">“Cuba’s Place in U.S. Naval Strategy”</a> (December 1962). <a href="#fnref:mahan" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:eia-gulf" role="doc-endnote">
      <p>U.S. Energy Information Administration, <a href="https://www.eia.gov/todayinenergy/detail.php?id=65444">“Gulf of America crude oil production forecast to remain near record highs in 2025 and 2026.”</a> The Gulf produces ~1.8 million barrels/day, approximately 13% of total US crude production. <a href="#fnref:eia-gulf" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:russia-cuba" role="doc-endnote">
      <p>PBS NewsHour, <a href="https://www.pbs.org/newshour/world/russian-warships-arrive-in-cuban-waters-for-military-exercises">“Russian warships arrive in Cuban waters for military exercises”</a> (June 2024). The frigate Admiral Gorshkov and nuclear submarine Kazan both carry Zircon hypersonic missiles. <a href="#fnref:russia-cuba" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:russia-pact" role="doc-endnote">
      <p>CiberCuba, <a href="https://en.cibercuba.com/noticias/2025-10-07-u1-e135253-s27061-nid312500-rusia-ratifica-alianza-militar-regimen-cubano">“Russia Ratifies Military Alliance with Cuban Regime”</a> (October 2025). Signed in Havana March 13, ratified by Russia’s parliament October 2025. <a href="#fnref:russia-pact" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:csis-cuba" role="doc-endnote">
      <p>CSIS, <a href="https://www.csis.org/analysis/chinas-intelligence-footprint-cuba-new-evidence-and-implications-us-security">“China’s Intelligence Footprint in Cuba”</a> (July 2024 and December 2024). Four facilities identified via satellite imagery at Bejucal, Wajay, Calabazar, and El Salao. <a href="#fnref:csis-cuba" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:cuba-tanker" role="doc-endnote">
      <p>Wikipedia, <a href="https://en.wikipedia.org/wiki/2026_Cuban_crisis">“2026 Cuban crisis.”</a> Russian oil tanker delivered 100,000 tonnes of crude to Havana on March 30, 2026. <a href="#fnref:cuba-tanker" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:cuba-emigration" role="doc-endnote">
      <p>CiberCuba, <a href="https://en.cibercuba.com/noticias/2025-12-30-u2-e2-s27061-nid317552-emigracion-cubana-2025-redistribucion-global-exodo">“Cuban Emigration 2025”</a> (December 2025). Over 860,000 Cubans arrived in the US alone between 2021 and mid-2024; population fell from ~11.2 million to 9.7 million. <a href="#fnref:cuba-emigration" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:cuba-crisis" role="doc-endnote">
      <p>Wikipedia, <a href="https://en.wikipedia.org/wiki/2026_Cuban_crisis">“2026 Cuban crisis.”</a> Cuba’s entire power grid collapsed on March 16, 2026, following the cutoff of Venezuelan and Mexican oil supplies. <a href="#fnref:cuba-crisis" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:fcas" role="doc-endnote">
      <p>Euronews, <a href="https://www.euronews.com/2025/11/17/is-europes-mega-defence-project-fcas-in-danger-of-failing-over-germany-france-disagreement">“Is Europe’s mega defence project FCAS in danger of failing?”</a> (November 2025). Dassault demanded 80% workshare; Germany warned parliament this was unacceptable. <a href="#fnref:fcas" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:eu-debt" role="doc-endnote">
      <p>Euronews, <a href="https://www.euronews.com/2026/02/10/macron-pushes-for-eu-common-debt-capacity-to-fund-europes-future">“Macron pushes for EU common debt capacity”</a> (February 2026). Germany and Italy jointly circulated a counter-document to EU capitals opposing Macron’s position. <a href="#fnref:eu-debt" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:repowereu" role="doc-endnote">
      <p>European Commission, <a href="https://commission.europa.eu/topics/energy/repowereu_en">“REPowerEU.”</a> EUR 300 billion total mobilized (EUR 72B grants, EUR 225B loans), with EUR 113 billion allocated for renewables and hydrogen through 2030. <a href="#fnref:repowereu" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:eu-gas" role="doc-endnote">
      <p>EU Council, <a href="https://www.consilium.europa.eu/en/infographics/where-does-the-eu-s-gas-come-from/">“Where does the EU’s gas come from?”</a> Russian gas dropped from 45% of EU imports (2021) to 13% (2025). US LNG now accounts for 56% of EU LNG imports. <a href="#fnref:eu-gas" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:eu-gas:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:ember" role="doc-endnote">
      <p>Ember, <a href="https://ember-energy.org/latest-updates/wind-and-solar-generated-more-power-than-fossil-fuels-in-the-eu-for-the-first-time-in-2025/">“Wind and solar generated more power than fossil fuels in the EU for the first time in 2025.”</a> <a href="#fnref:ember" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:ieefa" role="doc-endnote">
      <p>IEEFA, <a href="https://ieefa.org/resources/strait-hormuz-disruption-would-jeopardise-10-europes-lng-imports">“Strait of Hormuz disruption would jeopardise 10% of Europe’s LNG imports.”</a> <a href="#fnref:ieefa" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:rationing" role="doc-endnote">
      <p>TIME, <a href="https://time.com/article/2026/04/05/strait-of-hormuz-fuel-rationing-oil/">“Strait of Hormuz: How the crisis is driving energy rationing.”</a> (April 2026). Slovenia introduced fuel rationing; Austria implemented fuel tax cuts and retailer profit caps. <a href="#fnref:rationing" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:arctic-express" role="doc-endnote">
      <p>High North News, <a href="https://en.highnorthnews.com/business/china-launches-18-day-arctic-express-containership-route-to-europe-with-stops-in-uk-germany-poland/206024">“China launches 18-day Arctic Express containership route to Europe.”</a> Operated by Haijie Shipping with stops at Felixstowe, Rotterdam, Hamburg, and Gdansk. <a href="#fnref:arctic-express" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:arctic-ice" role="doc-endnote">
      <p>Nature Communications, <a href="https://www.nature.com/articles/s41467-024-54508-3">“First ice-free Arctic day possible before 2030”</a> (2024). Most models converge around ~2034 for consistently ice-free September conditions. <a href="#fnref:arctic-ice" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Dylan</name></author><category term="geopolitics" /><category term="markets" /><category term="macro" /><summary type="html"><![CDATA[I spent an evening going down a macro rabbit hole after reading Citrini Research’s field report on the Strait of Hormuz. One of their analysts was physically on the water near Oman, watching tankers transit through Iran’s checkpoint system, talking to fishermen and shipping captains. So sick. I texted some thoughts I’d had about it to my friend Cameron the next morning and our subsequence conversation sharpened a few of these ideas considerably. This post came out of that exchange.]]></summary></entry><entry><title type="html">Agentic Failure Modes</title><link href="https://www.dylanamartin.com/2026/03/24/agentic-failure-modes.html" rel="alternate" type="text/html" title="Agentic Failure Modes" /><published>2026-03-24T00:00:00+00:00</published><updated>2026-03-24T00:00:00+00:00</updated><id>https://www.dylanamartin.com/2026/03/24/agentic-failure-modes</id><content type="html" xml:base="https://www.dylanamartin.com/2026/03/24/agentic-failure-modes.html"><![CDATA[<p>After hundreds of agent-assisted sessions, two failure modes keep showing up. They look different, feel different, and need different responses.</p>

<h2 id="failure-mode-1-the-last-mile-problem">Failure mode 1: the last-mile problem</h2>

<p>Anyone who’s worked with coding agents knows this one — the agent gets me 90% of the way there, especially for UI work and well-specified features. The output compiles, the tests pass, the shape is right. But something’s off — a button three pixels too close to the edge, a sluggish loading state, an error message that reads like it was written by a robot (because it was).</p>

<p>The problem is that fit and finish is emergent. Once I see it, it’s obvious what to tweak. But I couldn’t have written that into the prompt ahead of time, because I didn’t know what “wrong” looked like until I was staring at it.</p>

<p>I used to try to capture everything in the initial prompt. Diminishing returns kicked in fast — the prompt got longer, the agent got more confused, and I spent more time specifying the work than it would have taken to just do the finishing touches myself.</p>

<p>So what I’ve started to do is treat the agent’s output as rough framing. Get it directionally correct and plan to finish by hand. The framing doesn’t need to be beautiful; it needs to be square. You can sand later.</p>

<p>In short: let the agent run, accept the 90%, and budget time for human finishing. Don’t over-specify. Don’t try to prompt your way to perfection.</p>

<h2 id="failure-mode-2-the-convincing-but-wrong-plan">Failure mode 2: the convincing-but-wrong plan</h2>

<p>This one is scarier because the failure mode doesn’t rear its head until much later; sometimes it can be too late.</p>

<p>It happens when I’m not deeply familiar with a domain, or when the relevant context is hard to capture in a prompt. The agent produces something that looks right — coherent design, clean code, passing tests. A reasonable reviewer would approve it.</p>

<p>But the whole direction is wrong.</p>

<p>I ran into this recently. I was building a new form of feature flag targeting for PostHog — one that could target both user and group properties in the same flag. That meant creating a relationship that combined two distinct entities, persons and groups, so we could process properties from both sides when evaluating a flag. The agent helped me design the schema, build the queries, write the tests. Everything worked. The PR looked solid.</p>

<p>The problem: the agent’s design added new fields directly to the persons table, which would have increased load on the ingestion in a way that wasn’t tenable. The correct approach was a join table, but you couldn’t really see that from the code or the tests. You needed to understand the ingestion pipeline’s load profile, and I didn’t know it well enough to flag the issue before I’d already pushed the first PR (fortunately, we caught it in code review, and nothing bad happened).</p>

<p>But that’s the pattern of this type of failure mode: this time, the agent didn’t get me 90% of the way there. It got me 100% of the way to the wrong destination.</p>

<p>In these types of scenarios, you need slow down <em>before</em> you start. Identify the assumptions that would be catastrophic if wrong. Check those assumptions against someone (or something) that knows the domain. The cost of starting in the wrong direction dwarfs the cost of an extra hour on design.</p>

<h2 id="whats-working-for-me">What’s working for me</h2>

<p>Before I spin up an agent, I ask myself: is this a “directional and finish” job, or a “deep-domain correctness” job? The answer informs how I spend my time.</p>

<p>For directional work, I sort tasks into a batch, kick off agents in parallel, and then — this is the part I had to learn the hard way — I finish them one at a time. Parallelism works for <em>starting</em>, not for <em>finishing</em>. A <a href="https://hbr.org/2026/03/when-using-ai-leads-to-brain-fry">recent HBR study</a> calls the alternative “brain fry” — juggling too many agent threads drives cognitive fatigue rather than reducing it. So I’ve turned off notifications from my LLM tools. Kick something off, come back when I’m ready. While an agent runs, I write down the follow-up questions I’ll need when it’s done. By the time it finishes, I’ve already thought through the next steps.</p>

<p>For deep-domain work, I front-load the thinking. I’m not prompting yet; I’m designing. I use <a href="https://steelman.cloud/">Steelman</a>, a tool I built for exactly this, to pressure-test my assumptions before I write a line of code. What are the load-bearing assumptions? What would make this approach catastrophically wrong? Where does my understanding have gaps? Only after that work do I involve the agent, and I watch it more carefully.</p>

<p>Either way, I accept that I won’t one-shot the result. I reserve undivided attention for the last 10% of each task. Coming back to something almost-done with fresh eyes is real momentum — easier to push over the finish line than to start from zero, and the fresh perspective catches things I’d have missed after hours of staring.</p>

<p>Not all LLM-assisted work fails the same way. Some fails at the edges and needs human polish. Some fails at the foundation and needs human judgment before it starts. Know which one you’re in, and plan accordingly.</p>

<p><em>This is the latest in an unplanned series about AI-assisted development. Previously: <a href="/2025/11/07/spinning-plates.html">Spinning Plates</a>, <a href="/2025/11/24/racing-towards-bethlehem.html">Racing towards Bethlehem</a>, <a href="/2026/02/02/spinning-the-wheel.html">Spinning the Wheel</a>, <a href="/2026/02/21/contra-yang-et-al.html">Contra Yang, et al</a>, <a href="/2026/03/11/announcing-steelman.html">Announcing Steelman</a>. Finally, I wanted to shout out Justin Duke’s <a href="https://www.jmduke.com/posts/llm-advance-team.html">LLM as advance team</a> – it resonated deeply with me, and got me thinking about this post in the first place.</em></p>]]></content><author><name>Dylan</name></author><category term="ai" /><category term="work" /><category term="software engineering" /><summary type="html"><![CDATA[After hundreds of agent-assisted sessions, two failure modes keep showing up. They look different, feel different, and need different responses.]]></summary></entry><entry><title type="html">Steelman: an adversarial reasoning tool for decision-making</title><link href="https://www.dylanamartin.com/2026/03/11/announcing-steelman.html" rel="alternate" type="text/html" title="Steelman: an adversarial reasoning tool for decision-making" /><published>2026-03-11T00:00:00+00:00</published><updated>2026-03-11T00:00:00+00:00</updated><id>https://www.dylanamartin.com/2026/03/11/announcing-steelman</id><content type="html" xml:base="https://www.dylanamartin.com/2026/03/11/announcing-steelman.html"><![CDATA[<p>I’ve been thinking a lot about how I make decisions; especially the hard ones, where I have a strong opinion and I’m not totally sure if it’s right.  The kind where you walk into a meeting, lay out your case, and someone asks a question you hadn’t considered, and suddenly you’re on your back foot, revising your argument in real time.</p>

<p>That experience of having your position challenged well and coming out the other side with something sharper is genuinely valuable.  But it doesn’t scale.  You can’t always find the right person to push back on your thinking at the right time.  And most of us don’t seek out that kind of friction voluntarily.  What we <em>do</em> instead, increasingly, is reach for an AI — and the AI mostly tells us we’re right.  It validates, polishes, and helps us build on assumptions we never examined.  We walk away feeling sharper when really we just feel more comfortable.  I wanted something that would make me <em>less</em> comfortable with my position before I committed to it.</p>

<p>So I built <a href="https://steelman.cloud/">Steelman</a>.</p>

<h2 id="what-it-does">What it does</h2>

<p>Steelman is an adversarial reasoning tool.  You state a position — “we should rewrite this service in Rust,” “single-payer healthcare is the only workable option,” “this essay’s thesis is airtight,”<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> whatever — and it puts your argument through a structured gauntlet.</p>

<p>Here’s how it works:</p>

<ol>
  <li>
    <p><strong>Claim decomposition.</strong>  You write your position, and the AI breaks it down into empirical claims (things that are verifiably true or false) and value judgments (trade-offs and priorities).  This is my favorite part tbh; seeing your argument decomposed into its load-bearing components is clarifying in a way that’s hard to describe until you’ve experienced it.</p>
  </li>
  <li>
    <p><strong>Three rounds of adversarial challenge.</strong>  Three escalating personas target the weakest parts of your argument. Each round, you defend your position. The AI assesses whether your responses actually address the challenges and updates the status of each claim accordingly.</p>
  </li>
  <li>
    <p><strong>A Decision Record.</strong>  At the end, you get a structured document: your refined position, the challenges you faced, which claims survived and which didn’t, and (crucially) falsification criteria.  Conditions under which you’d change your mind.</p>
  </li>
</ol>

<p>The important design constraint: the AI never writes <em>for</em> you.  It decomposes, mirrors, challenges, and structures, but every word in the final Decision Record is yours.  I didn’t want a tool that generates opinions.  I wanted one that pressure-tests them.</p>

<h2 id="why-i-built-it">Why I built it</h2>

<p>Honestly, I’m worried about what’s happened to decision-making in the age of AI. I’m <a href="/2026/02/02/spinning-the-wheel.html">as guilty of this as anyone</a> — we’re all going full-bore into using these thinking machines, and mostly that’s great.  But the default mode of every major AI chat app is sycophancy: you state a position, and the model validates it, maybe adds some caveats for plausibility, and helps you build on a premise it never questioned.  It’s not that AI <em>can’t</em> help with decisions — it’s that the way we’re using it trains us to outsource the thinking rather than sharpen it.  Vaughn Tan has a <a href="https://vaughntan.org/aiux">great piece on this</a> — he argues that mainstream AI interfaces create a “seductive mirage” of talking to a meaningmaking entity, when really they’re just tools, and that we need to design AI experiences that clearly separate the subjective judgment work only humans can do from the non-meaningmaking work that machines are good at.  That framing resonated with me.  The right role for AI in decisions isn’t to <em>make</em> them for you — it’s to force you to make them better yourself.  Steelman is my attempt at that: an AI tool that stays on its side of the line, structuring and challenging your reasoning while every decision about what matters and what to believe remains yours.  The default mode shouldn’t be “yes, and.”  It should be “okay, but have you considered.”</p>

<h2 id="the-stack">The stack</h2>

<p>For the folks who care about this kind of thing: it’s a Next.js app using Claude (via the Vercel AI SDK) for the structured generation, Supabase for persistence, and Tailwind for the UI.  I used Zod schemas to constrain the AI outputs into predictable structures — claim objects, challenge objects, assessment objects — which was essential for making the multi-round flow feel deterministic rather than vibes-based.</p>

<h2 id="try-it-out">Try it out</h2>

<p>Steelman is currently in closed beta.  If you’re interested, you can <a href="https://steelman.cloud/">sign up for the waitlist</a>.</p>

<p>I’m especially curious to hear from people who make a lot of high-stakes decisions — engineering managers, staff+ engineers, architects, but also founders, policy folks, anyone who writes arguments for a living — about whether this maps to how they actually think through problems.  The adversarial personas currently skew toward infrastructure and systems decisions, but Steelman works on any kind of argument, and I’d like to expand the persona set to cover more domains.</p>

<p>If you try it out and have thoughts, feel free to reach out to me via <a href="mailto:me@dylanamartin.com">email</a>.  I’m iterating on this actively and feedback from real users is worth more than any amount of me arguing with myself about what to build next (though Steelman is useful for that too).</p>

<p>We’re all going to keep using AI to help us think.  The question is whether it makes our thinking better or just makes us more confident.  Steelman is a bet that the right AI tool for decisions is one that challenges you, not one that agrees with you.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>These are deliberately terse examples for illustration.  In practice, the more detail you provide up front — context, constraints, prior art, why you believe what you believe — the better the adversarial challenges will be.  Steelman rewards specificity. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Dylan</name></author><category term="AI" /><category term="decision-making" /><category term="reasoning" /><category term="tools" /><category term="announcement" /><summary type="html"><![CDATA[I’ve been thinking a lot about how I make decisions; especially the hard ones, where I have a strong opinion and I’m not totally sure if it’s right. The kind where you walk into a meeting, lay out your case, and someone asks a question you hadn’t considered, and suddenly you’re on your back foot, revising your argument in real time.]]></summary></entry><entry><title type="html">Contra Yang, et al</title><link href="https://www.dylanamartin.com/2026/02/21/contra-yang-et-al.html" rel="alternate" type="text/html" title="Contra Yang, et al" /><published>2026-02-21T00:00:00+00:00</published><updated>2026-02-21T00:00:00+00:00</updated><id>https://www.dylanamartin.com/2026/02/21/contra-yang-et-al</id><content type="html" xml:base="https://www.dylanamartin.com/2026/02/21/contra-yang-et-al.html"><![CDATA[<p><em>This morning I woke up to a text from <a href="https://www.colorado.edu/ebio/andrew-martin">my dad</a>, who was asking for my opinion on <a href="https://blog.andrewyang.com/p/the-end-of-the-office">this piece</a> from Andrew Yang. I wrote him a shorter response that contained a decent chunk of what I’m about to say, but it turns out I had a lot more to say about the topic, and when I finally got done writing it all down, I had what almost looked like a blog post. Figured I might as well flesh it out, and here we are.</em></p>

<p>I try to read these viral AI-displacement pieces with an open but critical eye; looking for what’s genuinely new versus what’s just repackaged anxiety, and trying to separate the claims that hold up under scrutiny from the ones that fall apart one you bring the temperature down a few degrees. I read Yang’s piece with that spirit in mind.</p>

<p>I think the basic point of Yang’s piece is right: white collar work is information processing, AI is good at information processing, and the stock market will reward companies that figure out how to do more with fewer people. I don’t think that’s a particularly original take, and people in my industry have been frothing about this (on Twitter, on LinkedIn) for literally years. Maybe it’s hitting mainstream politics finally.</p>

<p>But his speculations on timelines are pretty insane. “20-50% of 70 million white-collar jobs” gone in “the next several years,” millions displaced in 12-18 months – based on what? A conversation with one CEO? Talk about anecdotes laundered into predictions. And while influencers in more tech-native sphere aren’t innocent of making these types of claims too (e.g. AI 2027<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>), the general vibe of this discourse feels like shock and awe over substance. Straight-line extrapolation dressed up as forecasting.</p>

<p>That was my biggest complaint the whole thing, really: the emotional engineering. The timbre. Yang frames someone in his family building a website in minutes as evidence that designers are obsolete, but anyone who ships software knows the demo is maybe 20% of the actual work. He cites mortgage delinquency charts as though AI is already cratering the housing market, but the actual NY Fed data tells a very different story<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. He names it “the Fuckening” because it “feels more visceral.” He’s made a career out of being the UBI politician; he links his book tour dates at the bottom. I don’t want to be uncharitable, but it’s worth noting that Yang’s financial incentives are perfectly aligned with maximizing alarm: the scarier the story, the more urgent the book feels, the more relevant the policy proposal becomes. That doesn’t make him wrong, but it does mean we should be especially careful about separating the signal from the sales pitch before engaging with the substance.</p>

<p>He also makes the classic non-tech mistake (intentional misdirection?) of framing AI demos like they’re real, load-bearing parts of software infrastructure. Yes, someone built a website in minutes. But the demo is maybe 20% (or less) of the actual work; the other 80% is edge cases, integration, compliance, error handling, all the stuff that makes things actually work in production. AI is still bad at that part, and I think there’s a meaningful reason it’s going to stay bad at it for a while that’s worth explaining.</p>

<p>The way these models improve is through evaluation: you need to be able to measure whether the model is getting better at a task in order to train it to be better at that task. For the demo stuff, evals are relatively straightforward. “Did the model produce working code that compiles and passes these test cases?” You can answer that programmatically. But most knowledge work isn’t like that. Most knowledge work is a bundle of tasks held together by judgment, context, and institutional memory, and the eval that would capture whether AI is doing <em>the whole job</em> well basically doesn’t exist.</p>

<p>This is Goodhart’s Law applied to AI capabilities: when a measure becomes a target, it ceases to be a good measure. AI benchmarks are saturating – SWE-bench scores went from 33% to over 70% in a single year<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> – and labs are increasingly optimizing for the benchmarks rather than for the messy, situated work the benchmarks are supposed to proxy for. Oxford researchers reviewed 445 AI benchmarks and found that most don’t actually measure what they claim to measure, suffering from vague definitions and absent statistical validation.<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> The model gets better at the test without necessarily getting better at the job.</p>

<p>And the hardest parts of knowledge work are precisely the parts that resist measurement. You can’t easily write an eval for “did this integration handle the edge case that only surfaces when the legacy billing system sends malformed dates on leap years” or “did this PR account for the implicit constraint that the payments team agreed to in a Slack thread six months ago.” These failures are vast yet specific, context-dependent, and often only recognizable as failures when a real user hits them in a real environment.</p>

<p>The common response here is that bigger context windows will solve this: just give the model the entire codebase, the Slack history, the docs, and let it figure it out. And it’s true that context windows are growing fast. But the bottleneck isn’t having the context; it’s knowing which context matters. An experienced engineer reading a PR doesn’t scan every Slack thread from the last six months; they know, from years of working in this system, that this thread about the payments team’s implicit constraint is relevant while ten thousand others aren’t. That’s not a retrieval problem. It’s a salience problem, one that depends on a mental model of how the system actually works, who made what tradeoffs and why, and what’s likely to break downstream. Throwing more context at a model can actually make this worse, not better, because you’re increasing the noise without improving the model’s ability to identify the signal.</p>

<p>In other words, this is a measurement problem, and measurement problems are slow to solve. You can’t easily evaluate whether a model correctly identified the relevant context, which means you can’t easily train it to get better at that task (Goodhart’s Law again, just at a different layer). I think the next generation of companies building vertical AI tooling will start to crack specific domains, but the generic “AI replaces knowledge worker” story requires solving eval problems that the entire field is still struggling with.</p>

<p>But beside all of that, even if the tech were ready tomorrow, have you ever watched a big company try to adopt any new software? Procurement cycles, compliance reviews, legacy system integration, middle managers fighting to keep headcount. Most Fortune 500s are still finishing cloud migrations they started a decade ago. These predictions always come in too hot. ATMs were supposed to kill bank tellers, spreadsheets were going to eliminate accountants, the internet was going to make offices obsolete by 2005. Every time, the tech changed jobs more than it killed them and new roles showed up that nobody predicted. Our economy might reward signals of efficiency, but in practice the underlying processes take forever.</p>

<p>This time COULD be different because AI is software not hardware, scales way faster, deploys way cheaper. I take that seriously. There’s already data suggesting the economics of software are shifting: SaaS gross margins among public companies have been declining, dropping from around 78% in 2020 to 72% by 2023 as product commoditization and competition compress pricing.<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup> The traditional seat-based SaaS model is under real pressure; if an AI agent can access a database and execute a workflow directly, why are you paying $70/seat/month for a dashboard that sits between a human and that same database? That’s a real structural shift worth watching. But “the economics are shifting” and “definitely catastrophic in 18 months” are very different claims.</p>

<p>Software engineering is probably the industry where this conversation is loudest and most specific, which makes sense: it’s the one closest to the technology itself. It’s also the one I know best, so let me talk about what I’m seeing in software engineering (I’ve been <a href="/2025/11/07/spinning-plates.html">writing</a> <a href="/2025/11/24/racing-towards-bethlehem.html">about</a> <a href="/2026/02/02/spinning-the-wheel.html">this</a> for a minute).</p>

<p>Specifically, I want to address this question: is software engineering in general just going up one abstraction layer? There’s a version of this argument that sounds clean and reassuring. We went from assembly to C to Python to “just tell the AI what to build,” and every time the previous layer’s practitioners were fine because they moved up. And there’s something to that. Gergely Orosz at The Pragmatic Engineer wrote about how even the creator of Claude Code didn’t open an IDE for an entire month; all his committed code was AI-written.<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup> Senior engineers are already spending less time typing and more time shaping systems (defining specs, reviewing output, making architectural decisions). AI just pushes that trend to its logical conclusion.</p>

<p>But I think the abstraction-layer framing obscures something important about what’s actually valuable in software engineering right now. It’s not “knowing how to code” in the syntactic sense. AI can write a for loop. It can scaffold a React app. It can even do a pretty good first pass at a complex feature if you give it enough context. What it can’t do well is hold the full mental model of a production system in its head: the implicit constraints, the historical decisions, the understanding of why this particular service communicates with that particular database in this particular way, and what breaks if you change it. The Stanford study I’ll get to in a minute found something relevant here: employment for developers aged 22-25 dropped nearly 20% from its late 2022 peak, but employment for workers over 30 in the same AI-exposed roles actually grew 6-12%.<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup> The market is telling us something. The value isn’t in writing code; it’s in the tacit knowledge that comes from years of shipping code in messy real-world environments. AI is great at the codified stuff. The un-codified stuff is where humans still dominate, and it’s where the value is concentrating.</p>

<p>There’s a related point here that I think gets lost in the discourse: not all software engineering is created equal, and AI is going to hit different parts of the industry very differently. I work at a product-led company where engineers are expected to talk to customers, make product decisions, think about activation funnels, and ship features that move business metrics. That kind of work is ambiguous, cross-functional, and deeply contextual. It’s hard to automate because the “right answer” isn’t well-defined and changes constantly based on user behavior and market conditions.</p>

<p>Compare that to programming at e.g. a large insurance company, where software is already more of a commodity – maintaining internal CRUD apps, building reports against legacy databases, implementing well-specified business logic. That work has been getting squeezed for years, first by offshoring, then by low-code tools, now by AI. Or think about the kind of programming that happens at a consulting firm, where you’re building roughly similar applications for different clients over and over. AI eats that for breakfast because the patterns are repetitive and the specifications are relatively concrete.</p>

<p>This isn’t a new divide. The frontier of software engineering has always been different from the commodity middle. What’s changing is that AI is dramatically widening that gap. If your work is primarily translating well-understood requirements into code, you’re in trouble regardless of Yang’s timeline, because that’s exactly what AI does best<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>. If your work involves navigating ambiguity, making judgment calls with incomplete information, and understanding complex sociotechnical systems, you’re probably fine for a long time. Arguably more valuable than ever, because AI is making the easy parts of your job faster while the hard parts remain stubbornly human.</p>

<p>Since we’re talking about data, let’s actually look at some, because the picture is more nuanced than Yang lets on (though it’s not exactly rosy either).</p>

<p>Morgan Stanley surveyed 935 executives across five sectors and found an average 4% net decline in headcount over 12 months, alongside an 11.5% productivity increase.<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup> Notably, U.S. companies actually reported a 2% net gain in jobs; the biggest pain was in the UK at 8% net loss, and concentrated among larger firms. Early-career positions were disproportionately affected, which tracks.</p>

<p>The Stanford Digital Economy Lab study is probably the most rigorous thing out there right now.<sup id="fnref:7:1" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup> Using ADP payroll data covering millions of workers, they found a 13% relative decline in employment for 22-25 year olds in the most AI-exposed occupations since late 2022. For software developers in that age range specifically, the drop was nearly 20% from peak. But (and this is the part Yang would leave out) they also found that employment for older workers in the same roles grew 6-12%, and that jobs where AI augments work rather than automates it haven’t seen similar declines. The adjustment is real, but it’s not uniform, and the “automation vs. augmentation” distinction matters enormously for predicting where this goes.</p>

<p>Challenger, Gray &amp; Christmas tracked 696,000 job cuts in the first five months of 2025, an 80% year-over-year jump.<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup> But they attribute this to a cocktail of tariffs, funding cuts, consumer spending shifts, and AI, not AI alone. The World Economic Forum’s 2025 report estimated 92 million jobs displaced by 2030 but 170 million new roles created, for a net gain.<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup> And a Harvard Business School professor studying this put it well: AI exposure overlaps with about 35% of tasks visible in labor market data, but the history of predicting employment effects from technology is “extraordinarily hard,” and the radiologists we were told to stop training in 2017 are busier than ever.<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote" rel="footnote">12</a></sup></p>

<p>What does all this tell us? The displacement is real, it’s measurable, and it’s hitting early-career workers first and hardest. But it’s also a 4% net headcount decline and a 13% relative employment drop in specific demographics, not the 20-50% apocalypse Yang is selling. The data supports “meaningful structural change that’s already underway and will accelerate” much more than it supports “the Fuckening.”</p>

<p>Plus like a lot of what he’s describing is also just an acceleration of stuff that’s been happening for years. Knowledge work offshoring, junior roles getting squeezed, bad grad employment numbers. AI is pouring gasoline on existing fires not starting new ones. There is something to this, though: economic transitions hurt the people who built their lives around stability. People who followed the script (school, useful degree, knowledge work career) are going to be disrupted. But I also think that’s just capitalism? Things change! We drive towards efficiency! I don’t think the mindset should ever be “learn a thing once and then coast on it”; the whole point is to be constantly examining yourself, updating your priors, and understanding that what worked in the past might not work in the future.</p>

<p>I want to be honest about the limits of that framing, though. “Stay curious and keep adapting” is easy advice for me to give. I’m in my thirties, I work at the frontier of this stuff, and my entire career has been built around the assumption that the tools and the landscape will keep changing. That’s a very different position than someone who’s 50, spent twenty years building expertise in a domain that’s about to get compressed, has a mortgage and kids in college, and is now being told to “upskill.” Yang is right that the social contract of “study hard, get a degree, get a stable career” is under real pressure, and I don’t think “just adapt” is a sufficient answer for everyone. The question of what we actually do for the people who can’t easily pivot is a real one, and I don’t have a clean answer for it. Yang’s answer is UBI, which is at least a concrete proposal, even if the way he’s selling it feels more like a campaign pitch than a policy discussion.</p>

<p>Maybe I’m coming across as too emotional too. I’ve read a lot of these doomsday scenario-type pieces and they always feel like they’re trying to manipulate me rather than inform me. I don’t doubt that things are changing rapidly, maybe faster than ever, and I think that I’m lucky to be in a frontier industry where this idea of adapting and changing and modifying my workflow is endemic. Frankly, the one true thing about software engineering has always been that it evolves and it rewards those who are intellectually open-minded and good at upskilling.</p>

<hr />

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://ai-2027.com/">“AI 2027”</a>, a speculative scenario piece by various AI industry figures. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>The NY Fed’s Q4 2025 report on mortgage delinquencies shows rising delinquencies are concentrated in lower-income zip codes and counties with rising unemployment — driven by income inequality and local labor/housing market conditions, not AI displacement — and are still normal by historical standards outside of pandemic-era lows. See <a href="https://libertystreeteconomics.newyorkfed.org/2026/02/where-are-mortgage-delinquencies-rising-the-most/">“Where Are Mortgage Delinquencies Rising the Most?”</a>, Liberty Street Economics, February 2026. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>SWE-bench Verified scores: top model solved 33% at launch in August 2024; leading models consistently above 70% by mid-2025. Via <a href="https://www.technologyreview.com/2025/12/15/1128352/rise-of-ai-coding-developers-2026/">MIT Technology Review</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>Oxford University review of 445 AI benchmarks, late 2025. Via <a href="https://aiforreal.substack.com/p/benchmark-vs-reality-understanding">AI For Real</a>. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p>SaaS Capital, 2025 SaaS Valuation Report. Median gross margins among publicly traded SaaS firms declined from 78% (2020) to 72% (2023). See also <a href="https://www.marketdataforecast.com/market-reports/software-as-a-service-saas-market">Market Data Forecast SaaS Market Report</a>. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p>Gergely Orosz, <a href="https://newsletter.pragmaticengineer.com/p/when-ai-writes-almost-all-code-what">“When AI Writes Almost All Code, What Happens to Software Engineering?”</a>, The Pragmatic Engineer, January 2026. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p>Erik Brynjolfsson, Danielle Li, and Lindsey Raymond, <a href="https://digitaleconomy.stanford.edu/wp-content/uploads/2025/08/Canaries_BrynjolfssonChandarChen.pdf">“Canaries in the Coal Mine? Six Facts about the Recent Decline in Employment for Young Workers”</a>, Stanford Digital Economy Lab, August 2025. See also coverage in <a href="https://fortune.com/2025/08/26/stanford-ai-entry-level-jobs-gen-z-erik-brynjolfsson/">Fortune</a> and <a href="https://time.com/7312205/ai-jobs-stanford/">TIME</a>. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:7:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p>I want to steelman the counterargument here, since I’m still somewhat of a believer in the AI revolution. Context windows are growing, agents are getting persistent memory across sessions, and the ability of AI to hold larger and larger mental models of a system is improving fast. The gap I’m describing — between writing code and understanding the system the code lives in — will narrow. But I think even with perfect recall, the bottleneck shifts from “can the AI access the relevant information” to “can it figure out which information matters for this specific decision” — which is closer to judgment than memory, and a fundamentally harder capability to build. For now, and I think for a while, that judgment is the thing experienced engineers are actually selling. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p>Morgan Stanley, <a href="https://www.morganstanley.com/insights/articles/ai-adoption-accelerates-survey-find">“AI Adoption Surges Driving Productivity Gains and Job Shifts”</a>. Survey of 935 corporate executives across five sectors in the US, Germany, Japan, and Australia. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p>Challenger, Gray &amp; Christmas, via <a href="https://www.cnbc.com/2025/10/22/ai-taking-white-collar-jobs-economists-warn-much-more-in-the-tank.html">CNBC</a>, October 2025. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:11" role="doc-endnote">
      <p>World Economic Forum, <a href="https://www.weforum.org/publications/the-future-of-jobs-report-2025/">Future of Jobs Report 2025</a>. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:12" role="doc-endnote">
      <p>Christopher Stanton, Harvard Business School, via <a href="https://news.harvard.edu/gazette/story/2025/07/will-your-job-survive-ai/">Harvard Gazette</a>, July 2025. <a href="#fnref:12" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Dylan</name></author><category term="ai" /><category term="reflection" /><category term="work" /><category term="predictions" /><category term="business" /><summary type="html"><![CDATA[This morning I woke up to a text from my dad, who was asking for my opinion on this piece from Andrew Yang. I wrote him a shorter response that contained a decent chunk of what I’m about to say, but it turns out I had a lot more to say about the topic, and when I finally got done writing it all down, I had what almost looked like a blog post. Figured I might as well flesh it out, and here we are.]]></summary></entry><entry><title type="html">Spinning the Wheel</title><link href="https://www.dylanamartin.com/2026/02/02/spinning-the-wheel.html" rel="alternate" type="text/html" title="Spinning the Wheel" /><published>2026-02-02T00:00:00+00:00</published><updated>2026-02-02T00:00:00+00:00</updated><id>https://www.dylanamartin.com/2026/02/02/spinning-the-wheel</id><content type="html" xml:base="https://www.dylanamartin.com/2026/02/02/spinning-the-wheel.html"><![CDATA[<p>A few months ago I wrote about <a href="/2025/11/07/spinning-plates.html">spinning plates</a> and <a href="/2025/11/24/racing-towards-bethlehem.html">racing toward bottlenecks</a>. The gist was that LLMs had changed how I work, I was faster but learning less, and I was trying to find a balance between leverage and atrophy.</p>

<p>I’ve stopped trying to find balance. I’m all in.</p>

<p>Over the holidays I refined my <a href="https://github.com/dmarticus/dotfiles/tree/main/ai">Claude Code setup</a>, went full-bore into multi-worktree setups with <a href="https://www.conductor.build/">Conductor</a>, and spent a lot of time iterating on my work process. On top of that, the models got better. The tooling caught up. And somewhere in there, I personally crossed a threshold. Now most of my work happens through agents. I’m living in Claude Code &amp; Conductor, spinning up sessions, watching them churn, merging the output. The 80/20 flip that Karpathy described happened to me too: 80% agent coding, 20% edits and touchups.<sup id="fnref:karpathy" role="doc-noteref"><a href="#fn:karpathy" class="footnote" rel="footnote">1</a></sup></p>

<p>It feels incredible. It feels like cheating. It feels like gambling.</p>

<h2 id="the-casino">The casino</h2>

<p>There’s a moment after you send a prompt where you’re just… waiting. The agent is running. You can see it thinking, reading files, making decisions. And there’s this little hit of anticipation: <em>what’s it going to do?</em> It’s the same dopamine loop as pulling a slot machine lever. Low effort, variable reward, endlessly repeatable.</p>

<p>Someone on Hacker News called it “doom tabbing”: the AI is already running, the bar to seeing what it does next is so low that you just… watch.<sup id="fnref:doomtab" role="doc-noteref"><a href="#fn:doomtab" class="footnote" rel="footnote">2</a></sup> A coworker described the opposite problem: you <em>can’t</em> just sit there, so you open Slack or try to multitask during the dead time. Either way you lose — watching keeps you in the dopamine loop, switching fragments your focus. Fifty times a day, both add up to a strange kind of fatigue. Pull the lever, spin the wheel, see what happens. The reward gets front-loaded; the difficult part – understanding what you built, debugging it six months later – gets pushed further out in time.</p>

<p>Ryan Broderick went even darker, calling generative AI an “edging machine”: it charges you for the thrill of feeling like you’re building something while caring more about the monetizable loop of engagement than the finished product.<sup id="fnref:garbageday" role="doc-noteref"><a href="#fn:garbageday" class="footnote" rel="footnote">3</a></sup> I don’t cosign his full doom take, but the framing stuck with me. There <em>is</em> something seductive about the loop. It simulates progress. It feels like making.</p>

<p>And then there’s “comprehension debt” – the tendency for the code in your codebase to become less and less understood over time because the AI one-shotted it and you just moved on.<sup id="fnref:comprehension" role="doc-noteref"><a href="#fn:comprehension" class="footnote" rel="footnote">4</a></sup> People counter that AI actually helps you <em>learn</em> — you can ask it to explain things, build mental models. I do this too. But when I ask for an explanation and then let it do the implementation, the understanding doesn’t stick the way it would if I’d written the code myself. It feels like learning in the moment. Whether it compounds into something durable, I’m not sure.</p>

<p>The casino is fun. When you’re on a heater, it really feels like you’re doing something. But the casino doesn’t care whether you understand what you built.</p>

<h2 id="the-fun">The fun</h2>

<p>And yet — work has never felt this fun.</p>

<p>I’ve always believed that energy management matters more than time management. If the work drains you, it doesn’t matter how many hours you have. And these tools have changed the energy equation. The drudgery is gone. The copying and pasting of compiler warnings, the boilerplate, the fill-in-the-blanks tedium – I just don’t do that anymore. What’s left is the creative part: deciding what to build, figuring out the shape of the solution, reviewing whether the output is good.</p>

<p>Karpathy, my coworkers, all the engineers I’ve talked to who use these tools – they’ve all noticed the same thing. Programming feels <em>more</em> fun now because the fill-in-the-blanks drudgery is removed and what remains is the creative part.</p>

<p>I also feel less stuck. When I hit a wall, I don’t have to grind through it alone. I can throw the problem at Claude, watch it try things, learn from what it attempts. There’s almost always a way to make some positive progress. That changes the emotional texture of the day. Less frustration, more momentum.</p>

<p>And the tenacity thing is real. Watching an agent relentlessly work at something – never tired, never demoralized, just trying approach after approach – is genuinely inspiring. I’ve seen Claude struggle with a problem for thirty minutes and then crack it. That stamina was always a bottleneck for me. Now it’s not.</p>

<h2 id="how-im-adapting">How I’m adapting</h2>

<p>I don’t have a clean answer to the “is this cheating?” question. But I have a working theory about how to stay a craftsman in the casino.</p>

<p>The shift I’ve made is this: I spend more time defining success criteria and less time doing the mechanical work of achieving them. Karpathy’s framing helped here. “Don’t tell it what to do, give it success criteria and watch it go.” The leverage comes from being declarative instead of imperative.</p>

<p>Boris Cherny, who created Claude Code, recently shared how his team uses the tool: start every complex task in plan mode, and pour your energy into the plan so Claude can one-shot the implementation.<sup id="fnref:bcherny" role="doc-noteref"><a href="#fn:bcherny" class="footnote" rel="footnote">5</a></sup> One person on his team has one Claude write the plan, then spins up a second Claude to review it as a staff engineer. Another says the moment something goes sideways, they switch back to plan mode and re-plan — don’t keep pushing. The pattern is the same — front-load the thinking, let the machine handle the doing.</p>

<p>My days have started to split into two modes. There’s contemplative time — defining goals, thinking through edge cases, building the reward function. That part is slow and focused. Then there’s execution time — spinning up agents, running them in parallel, triaging output. That part is fast and frenetic, caffeine-fueled, multi-stream.</p>

<h2 id="what-still-matters">What still matters</h2>

<p>The contemplative work is what makes the execution productive instead of just fun. Without it, I’m just pulling levers and hoping.</p>

<p>For frontend work, this means developing strong taste. Can I look at the output and <em>feel</em> whether it’s right? Does the UI make sense? Are the interactions smooth? I’ve been spending more time on what Jim Nielsen calls “sanding the UI” – the patient, iterative work of smoothing rough edges until something feels right.<sup id="fnref:sanding" role="doc-noteref"><a href="#fn:sanding" class="footnote" rel="footnote">6</a></sup> The agent can generate a component, but I’m the one who has to sand it.</p>

<p>For backend work, it means building robust test harnesses. Types that encode invariants. Property-based testing has been great for this – instead of writing specific test cases, I describe properties the code should always satisfy, and the framework generates hundreds of edge cases to throw at it. If the tests pass and the invariants hold, the code is probably fine. The work shifts from <em>writing</em> the code to <em>specifying</em> what correct code looks like. I build the acceptance criteria first – the tests, the types, the “what does correct look like?” – and only then let the agent loose against it.</p>

<p>And domain expertise matters more, not less. There’s a popular narrative that AI helps you upskill quickly in unfamiliar domains — and that’s true when you’re <em>learning</em>. But for this modality, for being genuinely productive with these tools, your existing expertise is what makes it work. The better I understand the problem space, the earlier I can catch the agent going down a wrong path. When I’m working in code I know well, I can interrupt a bad approach in the first few seconds. When I’m in unfamiliar territory, I might not realize something’s off until it’s been spinning for ten minutes. The models still make mistakes – subtle conceptual errors that a hasty junior dev might make, wrong assumptions they run with instead of checking.<sup id="fnref:karpathy-mistakes" role="doc-noteref"><a href="#fn:karpathy-mistakes" class="footnote" rel="footnote">7</a></sup> You have to watch them like a hawk. It ends up looking like pattern-matching on failure modes before they compound.</p>

<p>These are the things I’m holding onto – taste, rigor, expertise. The parts that feel like they might still be craft. Whether they’re enough to keep it that way, I’m not sure.</p>

<h2 id="the-question-i-cant-answer">The question I can’t answer</h2>

<p>Derek Thompson wrote a piece called “The Monks in the Casino” about young men who’ve retreated from social risk into dopamine loops: gambling, speculation, variable rewards without vulnerability.<sup id="fnref:thompson" role="doc-noteref"><a href="#fn:thompson" class="footnote" rel="footnote">8</a></sup> The casino reshapes what feels normal. What starts as entertainment becomes the default texture of experience.</p>

<p>I keep thinking about how that logic spreads. Engineering used to feel like one of the more contemplative corners of work – long stretches of focused thought, deep understanding as the goal. Now the casino has arrived here too. The tools are incredible, and they’re also slot machines. The dopamine loop is built into the workflow. And I’m not sure how vigilant I need to be, or whether vigilance is even the right frame.</p>

<p>The question I keep asking myself is whether this is still craft.</p>

<p>Craft implies understanding. It implies that the maker could explain every decision, could reproduce the work, could teach someone else how to do it. When I ship something Claude mostly wrote, can I say that? Sometimes yes. Sometimes I’m not sure.</p>

<p>There’s a comforting story I could tell myself here — that craft is evolving, that the new skill is knowing what to ask for and how to evaluate the output, that judgment is the new execution. Maybe that’s true. But I notice how convenient it is. It’s exactly the kind of thing you’d say to avoid sitting with the harder question.</p>

<p>What if the answer is actually no? What if I’m slowly trading away the thing that made me good at this — the deep, hard-won understanding — for speed and fun? What if the speed is the bribe?</p>

<p>Recent research suggests this isn’t just paranoia. A randomized experiment from Anthropic found that AI assistance impaired developers’ conceptual understanding, code reading, and debugging abilities – without even delivering significant efficiency gains on average.<sup id="fnref:anthropic-skills" role="doc-noteref"><a href="#fn:anthropic-skills" class="footnote" rel="footnote">9</a></sup> Only the interaction patterns that involved genuine cognitive engagement preserved learning outcomes. Their conclusion: “AI-enhanced productivity is not a shortcut to competence.”</p>

<p>But knowing that doesn’t tell me what to do. I don’t want to stop using these tools – they’re too good, and the work is too fun. A friend texted me yesterday: “What a time to be alive and programming, eh?” It really is. I’m locked in at the casino, the games are as good as they’ve ever been, and I’m watching myself play more than ever. The best I can do is pay attention.</p>

<hr />

<p><em>Thanks to <a href="https://jurajmajerik.com/">Juraj Majerik</a> for reading a draft of this and for feedback. This is the third post in an unplanned series about AI-assisted development. Previously: <a href="/2025/11/07/spinning-plates.html">Spinning Plates</a>, <a href="/2025/11/24/racing-towards-bethlehem.html">Racing Towards Bethlehem</a>.</em></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:karpathy" role="doc-endnote">
      <p>Andrej Karpathy’s <a href="https://x.com/karpathy/status/2015883857489522876">thread on AI-assisted coding</a> (January 2026) captures a lot of what I’ve been experiencing. The whole thing is worth reading. <a href="#fnref:karpathy" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:doomtab" role="doc-endnote">
      <p>From a <a href="https://news.ycombinator.com/item?id=46784594">Hacker News comment</a> that stuck with me: “The end result is very akin to doom scrolling. Doom tabbing?” <a href="#fnref:doomtab" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:garbageday" role="doc-endnote">
      <p>Ryan Broderick, “<a href="https://www.garbageday.email/p/generative-ai-is-an-expensive-edging-machine">Generative AI is an expensive edging machine</a>,” Garbage Day. His take is darker than mine, but the “edging machine” framing is vivid. <a href="#fnref:garbageday" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:comprehension" role="doc-endnote">
      <p>Jeremy Wei <a href="https://x.com/jeremytwei/status/2015886793955229705">coined the term</a> in a reply to Karpathy, who responded: “Love the word ‘comprehension debt,’ haven’t encountered it so far, it’s very accurate.” <a href="#fnref:comprehension" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:bcherny" role="doc-endnote">
      <p>Boris Cherny, “<a href="https://x.com/bcherny/status/2017742741636321619">Tips for using Claude Code</a>,” January 2026. <a href="#fnref:bcherny" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:sanding" role="doc-endnote">
      <p>Jim Nielsen, “<a href="https://blog.jim-nielsen.com/2024/sanding-ui/">Sanding UI</a>.” The metaphor is perfect: you can’t sand in one pass, you have to keep coming back with finer grit. <a href="#fnref:sanding" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:karpathy-mistakes" role="doc-endnote">
      <p>Karpathy again: “The mistakes have changed a lot – they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking.” <a href="#fnref:karpathy-mistakes" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:thompson" role="doc-endnote">
      <p>Derek Thompson, “<a href="https://www.derekthompson.org/p/the-monks-in-the-casino">The Monks in the Casino</a>,” November 2025. <a href="#fnref:thompson" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:anthropic-skills" role="doc-endnote">
      <p>Judy Hanwen Shen and Alex Tamkin, “<a href="https://www.anthropic.com/research/AI-assistance-coding-skills">How AI Assistance Impacts the Formation of Coding Skills</a>,” Anthropic, January 2026. The full <a href="https://arxiv.org/abs/2601.20245">paper</a> is worth reading if you’re thinking about how to preserve skill formation while using AI tools. <a href="#fnref:anthropic-skills" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Dylan</name></author><category term="ai" /><category term="reflection" /><category term="work" /><summary type="html"><![CDATA[A few months ago I wrote about spinning plates and racing toward bottlenecks. The gist was that LLMs had changed how I work, I was faster but learning less, and I was trying to find a balance between leverage and atrophy.]]></summary></entry><entry><title type="html">What I Talk About When I Talk About PostHog</title><link href="https://www.dylanamartin.com/2026/01/28/what-I-talk-about-when-I-talk-about-posthog.html" rel="alternate" type="text/html" title="What I Talk About When I Talk About PostHog" /><published>2026-01-28T00:00:00+00:00</published><updated>2026-01-28T00:00:00+00:00</updated><id>https://www.dylanamartin.com/2026/01/28/what-I-talk-about-when-I-talk-about-posthog</id><content type="html" xml:base="https://www.dylanamartin.com/2026/01/28/what-I-talk-about-when-I-talk-about-posthog.html"><![CDATA[<p>I’ve been at PostHog for about eighteen months now. Long enough to ship meaningful work, long enough to break things in production, long enough to feel the weight of both. This is my attempt to write down what that’s been like.</p>

<h2 id="part-one-the-excellent">Part One: The Excellent</h2>

<p>The thing I keep coming back to is the combination of autonomy, opportunity, and impact. These three words get thrown around a lot in job postings, but at PostHog they actually mean something. I’ve worked on genuinely interesting problems that matter to the business, and I’ve been given real freedom in how I solve them. Most days I wake up excited about the work. That’s rare, and I don’t take it for granted.</p>

<p>The feature flags team sits at an interesting intersection: we’re responsible for infrastructure that needs to be fast and reliable (we serve billions of flag evaluations), but we’re also building product that developers interact with directly. I’ve gotten to do both. I <a href="https://posthog.com/blog/even-faster-more-reliable-flags">rewrote our evaluation service in Rust</a>, shaving latency and improving reliability. I’ve also shipped product features, worked on SDK improvements, and thought deeply about developer experience. The breadth is energizing.</p>

<p>The learning has been extraordinary. I came to PostHog wanting to write Rust professionally. At my previous startup, I’d read Luca Palmieri’s <em>Zero to Production in Rust</em> and knew the language was a good fit for the performance-critical work I wanted to do, but the existing tech stack and hiring concerns made it impractical. At PostHog, I finally got the chance. I’ve now built and shipped production Rust services handling real scale. I’ve learned how to operate distributed systems, how to debug cascading failures, how to think about reliability as a discipline rather than an afterthought. All of this through the lens of actual work, not side projects or tutorials.</p>

<p>And then there are the people. PostHog has assembled an extraordinary group of engineers from around the world. The talent density is intimidating in the best way; I’m constantly learning from my teammates. The company also invests heavily in bringing people together in person. In my eighteen months, I’ve done team meetups and offsites in Toronto, New York, Amsterdam, and San Francisco, plus company-wide offsites in Mykonos and Mexico. I have another one coming up in London. These trips aren’t just perks; they’re how a distributed team builds the trust and rapport that makes async collaboration actually work<sup id="fnref:meetups" role="doc-noteref"><a href="#fn:meetups" class="footnote" rel="footnote">1</a></sup>.</p>

<h2 id="part-two-the-challenges">Part Two: The Challenges</h2>

<p>I want to be honest about this part, because I think the hard stuff is where the real learning happens.</p>

<p>The biggest challenge was rebuilding our feature flags evaluation engine while it was running in production. You can’t feature flag the feature flag service. Every change ships to everyone, immediately. This constraint forced me to think carefully about testing, validation, and rollout strategies. I built extensive test harnesses, shadow testing infrastructure, and tooling to validate behavior before shipping. It was some of the most disciplined engineering work I’ve done.</p>

<p>The rewrite itself went well. What came after was harder.</p>

<p>In late 2025, we had a series of incidents. Four outages in October alone, totaling over fourteen hours of customer impact. The technical details are in our <a href="https://github.com/PostHog/post-mortems/blob/main/2025-10-21-feature-flags-recurring-outages.md">post-mortems</a>, but the short version is: we discovered failure modes in the new service that we hadn’t anticipated. CPU resource sizing issues caused cascading failures. Connection pools exhausted under load. Retry logic amplified problems instead of containing them. Each incident taught us something, but the lessons came at a cost.</p>

<p>Those weeks were some of the hardest of my career. I wasn’t sleeping well. The feeling of letting customers down was awful; these are developers who depend on our service to ship their own products, and we were failing them. I took some time off. I seriously considered whether I wanted to keep doing this kind of work.</p>

<p>What got me through was the team and the culture. Folks from the leadership team reached out to me directly to check in and offer support – I typically don’t hear much from them, but they all reached out when I needed it, and that affected me more than I expected it would. PostHog practices blameless post-mortems, and they really mean it. After each incident, the question was never “who screwed up?” but “what allowed this to happen?” My coworker Phil Haack <a href="https://haacked.com/archive/2026/01/06/one-year-at-posthog/">wrote about this</a> in his own reflection on his first year. That approach made it possible to actually learn from the failures instead of just feeling bad about them.</p>

<p>The incidents also forced us to rethink our team structure. Before, we had one feature flags team with a sprawling scope: SDKs, product UI, platform infrastructure. After, we split into two focused teams. Phil now leads the Flags Platform team, laser-focused on performance, reliability, and architecture. I lead the Feature Flags product team, focused on the configuration UI, cohorts, early access features, and SDKs. The split lets each team go deep on their domain without feeling pulled in competing directions.</p>

<p>Looking back, I think the hardest part wasn’t the technical debugging. It was sitting with the uncertainty while we were still figuring things out. There’s a specific kind of dread that comes from knowing something is broken, knowing people are affected, and not yet knowing why. Learning to function in that state, to keep investigating methodically instead of panicking, was its own kind of growth.</p>

<h2 id="part-three-whats-next">Part Three: What’s Next</h2>

<p>For my first eighteen months, I was hired as a product engineer but spent most of my time on platform work. Rust, performance, reliability, infrastructure. I loved it, and I’m proud of what we built. But I’m ready for a change.</p>

<p>I’m shifting my focus toward product engineering. I want to get closer to our users and think more directly about business impact. Feature flags is already a sticky product, which is solid for something that isn’t a daily-use feature. But our activation rate is lower than I’d like. There’s a gap between people who express interest during onboarding and people who actually end up using the product. That gap feels like an opportunity.</p>

<p>I want to understand why people bounce. I want to make the product so good that trying it feels effortless. I want PostHog to be the obvious choice for teams who care about feature flags and want to understand what those flags actually do.</p>

<p>This is different work than writing Rust services. It’s more ambiguous, more user-facing, more tied to metrics I can’t fully control. I’m excited to learn how to be good at it.</p>

<hr />

<p>If any of this resonates, we’re hiring.</p>

<p>Phil’s <a href="https://posthog.com/teams/flags-platform">Flags Platform team</a> is looking for <a href="https://posthog.com/careers/backend-engineer">backend engineers</a> who want to tackle hard problems at scale: Rust, distributed systems, reliability engineering. If you want to work on infrastructure that serves billions of requests and learn from some genuinely excellent engineers, this is the role.</p>

<p>My <a href="https://posthog.com/teams/feature-flags">Feature Flags product team</a> is hiring <a href="https://posthog.com/careers/product-engineer">product engineers</a> who care about developer experience and want to ship features that users actually love. If you’re energized by the intersection of product thinking and technical depth, come work with us.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:meetups" role="doc-endnote">
      <p>PostHog gives every team a meetup budget to get together in person several times a year, separate from the company-wide offsites. It’s one of those policies that sounds nice on paper but genuinely changes how the work feels. Hard to overstate how much easier it is to collaborate async with someone after you’ve spent a week working alongside them. <a href="#fnref:meetups" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Dylan</name></author><category term="career" /><category term="reflection" /><category term="software engineering" /><category term="posthog" /><summary type="html"><![CDATA[I’ve been at PostHog for about eighteen months now. Long enough to ship meaningful work, long enough to break things in production, long enough to feel the weight of both. This is my attempt to write down what that’s been like.]]></summary></entry><entry><title type="html">New Year, New Me</title><link href="https://www.dylanamartin.com/2026/01/27/new-year-new-me.html" rel="alternate" type="text/html" title="New Year, New Me" /><published>2026-01-27T00:00:00+00:00</published><updated>2026-01-27T00:00:00+00:00</updated><id>https://www.dylanamartin.com/2026/01/27/new-year-new-me</id><content type="html" xml:base="https://www.dylanamartin.com/2026/01/27/new-year-new-me.html"><![CDATA[<p>I redesigned this site over the weekend. If you’re reading this, you’re looking at the new version.</p>

<p>The old design was fine. It worked, it loaded fast, it was readable. But it had the classic “developer blog” energy: purely functional, zero personality. I wanted something that felt more like a design studio homepage and less like a default Jekyll theme with the serial numbers filed off.</p>

<h2 id="what-i-was-going-for">What I was going for</h2>

<p>I spend a lot of time looking at personal sites. The ones I keep coming back to share a few traits:</p>

<ul>
  <li>
    <p><strong>Monospace as a design choice.</strong> The best personal sites I’ve seen use monospace because it creates a specific mood: technical, deliberate, slightly editorial. I wanted that same energy; Berkeley Mono in particular has enough personality to carry body text while still feeling sharp at small sizes for nav labels and metadata.</p>
  </li>
  <li>
    <p><strong>Strong typographic hierarchy.</strong> Big bold headings, tight letter-spacing, uppercase labels for navigation and section headers. The kind of thing where you can squint at the page and still understand its structure. Type size, weight, and spacing do the heavy lifting; color and decoration stay minimal.</p>
  </li>
  <li>
    <p><strong>Structural borders.</strong> I like sites where the borders do real work: separating navigation from content, delineating sidebar sections, anchoring lists. Intentional 2px lines that say “this is a boundary.” The nav and footer get strong borders; everything else stays subtle.</p>
  </li>
  <li>
    <p><strong>Light and dark, automatically.</strong> I’ve had dark mode on this site since <a href="/2020/12/04/implementing-dark-mode-for-my-website.html">2020</a>, but the old palette was an afterthought. This time both modes are first-class citizens via <code class="language-plaintext highlighter-rouge">prefers-color-scheme</code>, with warm off-whites and deep charcoals.</p>
  </li>
  <li>
    <p><strong>Sidebars that earn their space.</strong> Each sidebar section is a discrete card with its own purpose: stats, links, subscribe options, fun facts. If a page has a sidebar, the sidebar has a job.</p>
  </li>
</ul>

<h2 id="the-font">The font</h2>

<p>I switched everything to <a href="https://berkeleygraphics.com/typefaces/berkeley-mono/">Berkeley Mono</a>. I’ve been using it in my editor for a while and I think it’s the best monospace font available right now. It has enough character to carry long-form prose and it looks great at the small sizes I use for navigation and section labels.</p>

<p>Self-hosting was straightforward: four <code class="language-plaintext highlighter-rouge">@font-face</code> declarations pointing at <code class="language-plaintext highlighter-rouge">.otf</code> files, with <code class="language-plaintext highlighter-rouge">font-display: swap</code> to avoid FOUT. The fallback stack goes TX-02, JetBrains Mono, SF Mono, Fira Code, Cascadia Code, so it degrades gracefully.</p>

<h2 id="the-layout">The layout</h2>

<p>The grid is simple: a main content area and a 260px sidebar on desktop, collapsing to a single column on mobile. CSS Grid makes this trivially easy:</p>

<div class="language-css highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">.page-content.with-sidebar</span> <span class="p">{</span>
  <span class="py">grid-template-columns</span><span class="p">:</span> <span class="m">1</span><span class="n">fr</span> <span class="n">var</span><span class="p">(</span><span class="n">--sidebar-width</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Every page that benefits from a sidebar gets one. The writing index has stats and subscribe links. Individual posts have an info table with publish date, word count, and slug. The homepage has social links, feeds, and fun facts. Pages like speaking and projects use the full width.</p>

<h2 id="the-details">The details</h2>

<p>A few smaller decisions that I think matter:</p>

<p><strong>Navigation</strong> is uppercase, bold, and letterspaced. More design studio masthead than list of links. The nav and footer both use 2px borders against the strong color, so the page has clear top and bottom anchors.</p>

<p><strong>Section headers</strong> (“Currently”, “Previously”, “Interests” on the homepage) are <code class="language-plaintext highlighter-rouge">display: inline-block</code> with a 2px underline. This gives them visual weight while keeping them compact.</p>

<p><strong>The post list</strong> has a bold top border and per-post word counts inline with the date. I added aggregate word count stats to the sidebar too; partly because I’m curious, partly because it’s the kind of thing I like seeing on other people’s sites.</p>

<p><strong>Sidebar sections</strong> are bordered cards with uppercase headings and a subtle bottom border inside each card. Small thing, but it makes the sidebar feel intentional.</p>

<h2 id="content-changes">Content changes</h2>

<p>While I was in there, I made some structural changes too:</p>

<ul>
  <li><strong>Renamed “Blog” to “Writing”</strong> since that’s more accurate and I like how it reads in the nav.</li>
  <li><strong>Renamed “Talks” to “Speaking”</strong> for the same reason.</li>
  <li><strong>Created a Digest page</strong> by pulling the reading list out of the Media page and giving it its own home. It felt buried before.</li>
  <li><strong>Added a Uses page</strong> because I’ve always liked <code class="language-plaintext highlighter-rouge">/uses</code> pages on other developers’ sites (h/t to <a href="https://usesthis.com/">UsesThis.com</a>).</li>
  <li><strong>Added word counts</strong> to the writing index, the stats sidebar, and each post’s info box.</li>
</ul>

<h2 id="the-tools">The tools</h2>

<p>This is still a Jekyll site hosted on GitHub Pages. One CSS file, a few Liquid templates, some HTML. The whole design system lives in CSS custom properties:</p>

<div class="language-css highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">:root</span> <span class="p">{</span>
  <span class="py">--bg</span><span class="p">:</span> <span class="m">#fafaf8</span><span class="p">;</span>
  <span class="py">--text</span><span class="p">:</span> <span class="m">#1a1a2e</span><span class="p">;</span>
  <span class="py">--accent</span><span class="p">:</span> <span class="m">#3d5af1</span><span class="p">;</span>
  <span class="py">--border</span><span class="p">:</span> <span class="m">#d0d0d0</span><span class="p">;</span>
  <span class="py">--border-strong</span><span class="p">:</span> <span class="m">#1a1a2e</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Dark mode is a <code class="language-plaintext highlighter-rouge">prefers-color-scheme: dark</code> media query that swaps those values. Your OS decides.</p>

<p>The spirit here, as it’s been since I made the <a href="https://github.com/dmarticus/dmarticus.github.io/pull/1/changes#diff-3729493d031f7e2d26243070815ce0be4cc97590732407d8bcb15735452f0afbR1-R17">first commit</a> to this site, is basically <a href="https://motherfuckingwebsite.com/">motherfuckingwebsite.com</a>: no JS, no build step, no dependencies beyond what a browser already gives you. HTML, CSS, and content. The whole site loads fast, works everywhere, and I can understand every line of it.</p>]]></content><author><name>Dylan</name></author><category term="website" /><category term="design" /><category term="css" /><summary type="html"><![CDATA[I redesigned this site over the weekend. If you’re reading this, you’re looking at the new version.]]></summary></entry><entry><title type="html">Dotfiles</title><link href="https://www.dylanamartin.com/2026/01/04/dotfiles.html" rel="alternate" type="text/html" title="Dotfiles" /><published>2026-01-04T00:00:00+00:00</published><updated>2026-01-04T00:00:00+00:00</updated><id>https://www.dylanamartin.com/2026/01/04/dotfiles</id><content type="html" xml:base="https://www.dylanamartin.com/2026/01/04/dotfiles.html"><![CDATA[<p>I was talking to my buddy <a href="https://cameron.otsuka.systems/">Cameron</a> about all of the custom Claude code stuff I’ve been tinkering with (I talk about this in <a href="/2025/11/07/spinning-plates.html">Spinning Plates</a> and <a href="/2025/11/24/racing-towards-bethlehem.html">Racing towards Bethlehem</a>), and he asked me if he could see some of the agent stuff I’ve written. This made me realize that I’ve never actually published the dotfiles where I keep all my configurations. His question, plus recently reading <a href="https://www.jmduke.com/posts/dotfiles.html">Justin’s post about this</a>, inspired me to clean up the code and <a href="https://github.com/dmarticus/dotfiles">publish my dotfiles</a>. Maybe folks will find them useful, but even if I’m the only one who does, I’m glad they’re public.</p>]]></content><author><name>Dylan</name></author><category term="ai" /><category term="work" /><category term="dotfiles" /><category term="setup" /><summary type="html"><![CDATA[I was talking to my buddy Cameron about all of the custom Claude code stuff I’ve been tinkering with (I talk about this in Spinning Plates and Racing towards Bethlehem), and he asked me if he could see some of the agent stuff I’ve written. This made me realize that I’ve never actually published the dotfiles where I keep all my configurations. His question, plus recently reading Justin’s post about this, inspired me to clean up the code and publish my dotfiles. Maybe folks will find them useful, but even if I’m the only one who does, I’m glad they’re public.]]></summary></entry><entry><title type="html">We Have New York at Home</title><link href="https://www.dylanamartin.com/2025/12/19/we-have-new-york-at-home.html" rel="alternate" type="text/html" title="We Have New York at Home" /><published>2025-12-19T00:00:00+00:00</published><updated>2025-12-19T00:00:00+00:00</updated><id>https://www.dylanamartin.com/2025/12/19/we-have-new-york-at-home</id><content type="html" xml:base="https://www.dylanamartin.com/2025/12/19/we-have-new-york-at-home.html"><![CDATA[<p>I spent a month living and working in New York recently, and I loved it — not for any one specific high, but for how much better my day-to-day life felt. The shape of my days wasn’t even that different from Seattle: I still worked, grabbed coffee, ran errands, met up with friends. It just felt easier to be out in it. More “let’s hit the town tonight” baked into a random Tuesday.</p>

<p>What I’m trying to keep is the state it put me in: leaving the house at any opportunity, walking places by default, keeping a running backlog of things to try, and treating any day like it might be worth doing something.</p>

<p>I’ve lived in Seattle for more than 8 years and I still love it here — the nature access is world class, and I find the city beautiful. But after living here for so long (the longest I’ve ever lived somewhere since before I went to college), I’ve noticed that it’s easy to settle into a routine and stop exploring. So, I’m writing this down partly so I don’t spend the winter wishing I was still in New York, and partly to see if I can make these behaviors stick.</p>

<h2 id="walking-as-continuity">Walking as continuity</h2>

<p>In New York, walking is how the day moves. And because you’re on foot, you experience the space between things.</p>

<p>That matters more than I expected. Walking makes time feel contiguous. You get little hits of texture: a new storefront, a poster for a show, a line outside a place you didn’t know existed, weather that forces you to actually notice the season.</p>

<p>In Seattle it’s too easy to turn life into teleportation. Efficient, private, slightly dead.</p>

<p>So I’m trying to walk more as a way to keep the city “on.” If it’s plausibly walkable, I want my default to be: fine, I’ll walk.</p>

<h2 id="passive-transit-is-goated">Passive transit is goated</h2>

<p>In New York, the whole city felt 30 minutes away. I know it’s the most tired take of all time, but the ubiquity and consistency of the subway was perpetually delightful — I’d show up basically anywhere, ride, arrive. I could read, write, play Balatro, or just stare into the middle distance and let my brain idle. That’s time you don’t get back when you’re driving.</p>

<p>And it wasn’t just commuting. Going out at night, meeting someone across town, running errands — it was all the same. You just… go. No parking, no route decisions, no low-level vigilance. I didn’t realize how much I liked that until I had it constantly.</p>

<p>Seattle isn’t that. Link is expanding and buses run, but coverage is spotty, frequency varies, and for most trips driving is still faster. I’m lucky — I live on a bus line that goes straight to my office and my gym, so I’ve started taking it for both. Some days the app lies and I just drive. But when it works, I show up less scattered, and I’ve already had 20 minutes to read or do nothing. That’s worth protecting.</p>

<h2 id="keep-a-running-list-of-spots-to-hit">Keep a running list of spots to hit</h2>

<p>This was the big one, and it’s something I know my friends have done for years and I’ve balked for a while out of (mostly) laziness. We all have to learn our lessons in our own time.</p>

<p>But yeah, in New York, I kept a list of places I wanted to go (and <a href="https://maps.app.goo.gl/A6E3R6HkmzVETfHfA">I made a list of places I did go</a>): coffee shops, bars, restaurants, museums, neighborhoods, specific dishes. It was a backlog, and it had the obvious effect of making me more excited to go out.</p>

<p>It turned “what should we do tonight?” from an empty question into a menu. I wasn’t inventing a plan from scratch when I was tired; I was selecting something I’d already pre-approved when I had energy.</p>

<p>Back in Seattle, I realized how easy it is to fall into the same loop unless I stay current on what’s new. New stuff opens quietly. Scenes shift. Neighborhoods evolve. If I don’t capture that anywhere, I default to the same places because they’re good and easy and already in my head.</p>

<p>So now I keep a “Seattle backlog” on purpose:</p>

<ul>
  <li>Restaurants and bars I’ve heard about</li>
  <li>Coffee shops and bakeries</li>
  <li>Bookstores, galleries, small venues</li>
  <li>Parks, viewpoints, hiking spots</li>
  <li>Single specific items worth leaving the house for</li>
</ul>

<p>And I treat it like an object I maintain casually. Friend recommends a place? List. I walk past something interesting? List. I see a poster? List.</p>

<h2 id="bring-the-energy">Bring the energy</h2>

<p>I’ve never been one to shy away from going out, and I’ve been accused of being “high-energy” by friends and foes alike. But even I’ve succumbed to the Seattle-specific trap (which is especially bad in winter): waiting to feel like going out.</p>

<p>Seattle is quieter, and the city’s energy doesn’t exactly do you favors here. In New York, it’s easy to get swept along — there’s always something happening, and it feels like the default setting is “sure, why not.” In Seattle, you can blink and it’s 9pm and you’re still on the couch, perfectly comfortable (and the bars close in 2 hours anyway so what’s the point).</p>

<p>Rain also makes that feel rational. “Cozy” becomes ideology. And sometimes staying in is correct. But sometimes it’s just inertia that’s learned how to speak softly.</p>

<p>The thing is, the biggest hurdle isn’t the weather or the city — it’s me. If I actually decide to go out, there are always places in Seattle that are a good time. Despite my flippancy, there’s always a bar with a vibe, a restaurant that hits, a show somewhere, a friend who’s down.</p>

<p>One thing I liked about myself in New York is that I didn’t feel that complacency as much. I’d just decide the night was happening. Pick a place. Go. I’m trying to keep that: put on real clothes, make the plan, leave the house.</p>

<h2 id="you-aint-gonna-need-it">You ain’t gonna need it</h2>

<p>Our place in New York was small and tasteful, but lacking in many of the creature comforts of home. By the end it felt more like a feature than a bug though, turns out living with fewer things exposed yet another obvious truth: extra stuff is mostly maintenance.</p>

<p>More clothes, more objects, more “just in case” adds up. Physical clutter; attention debt.</p>

<p>I was surprised to notice this because my Seattle setup felt perfect to me before I left. My wife and I love our place. It’s beautiful, it’s great for entertaining, the view is good. My eight sleep mattress has ruined all other beds for me. But like anything else, stuff accumulates, and it’s weirdly hard to let things go once they’ve been added to the fold. Living with less helped me notice the accumulation.</p>

<p>I want my place to feel light enough that it doesn’t pre-tire me, and nice enough that I’m not collecting stuff just to compensate for anything.</p>

<h2 id="the-point-of-all-this">The point of all this</h2>

<p>I’m making a bet that Seattle already has a lot of the raw material for me to build a daily lifestyle similar to what I had in New York. The only real question is whether I keep choosing it, and if it sticks. I haven’t abandoned the thought of moving to New York altogether.</p>]]></content><author><name>Dylan</name></author><category term="reflection" /><category term="seattle" /><category term="lifestyle" /><category term="personal" /><summary type="html"><![CDATA[I spent a month living and working in New York recently, and I loved it — not for any one specific high, but for how much better my day-to-day life felt. The shape of my days wasn’t even that different from Seattle: I still worked, grabbed coffee, ran errands, met up with friends. It just felt easier to be out in it. More “let’s hit the town tonight” baked into a random Tuesday.]]></summary></entry><entry><title type="html">Racing towards Bethlehem</title><link href="https://www.dylanamartin.com/2025/11/24/racing-towards-bethlehem.html" rel="alternate" type="text/html" title="Racing towards Bethlehem" /><published>2025-11-24T00:00:00+00:00</published><updated>2025-11-24T00:00:00+00:00</updated><id>https://www.dylanamartin.com/2025/11/24/racing-towards-bethlehem</id><content type="html" xml:base="https://www.dylanamartin.com/2025/11/24/racing-towards-bethlehem.html"><![CDATA[<p>After I published <a href="/2025/11/07/spinning-plates.html">Spinning Plates</a>, my old coworker <a href="https://danielbachhuber.com/">Daniel</a> left <a href="https://www.linkedin.com/feed/update/urn:li:activity:7393663853852557312?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7393663853852557312%2C7393672175381045248%29&amp;dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287393672175381045248%2Curn%3Ali%3Aactivity%3A7393663853852557312%29">a comment</a> I couldn’t stop thinking about:</p>

<blockquote>
  <p>“If your individual velocity has increased, how are you handling the other bottlenecks in the system — code review being an obvious one?”</p>
</blockquote>

<p>Around the same time, I’d been revisiting Ordep’s <a href="https://ordep.dev/posts/writing-code-was-never-the-bottleneck">Writing Code was Never the Bottleneck</a> essay, which opens with a line that feels like a koan:</p>

<blockquote>
  <p>“Writing lines of code was never the bottleneck in software engineering.<br />
The actual bottlenecks were, and still are, code reviews, knowledge transfer, testing, debugging, and the human overhead of coordination.”</p>
</blockquote>

<p>Right. Exactly. That’s been the shape of the job for decades. When writing becomes nearly free, all the work you can’t automate steps out of the shadows.</p>

<p>And yet something <em>has</em> shifted for me. Not the bottlenecks themselves, but the work wrapped around them. That’s what this post is trying to unpack.</p>

<h2 id="the-bottlenecks-are-still-the-bottlenecks">The bottlenecks are still the bottlenecks</h2>

<p>Ordep’s core point is, in my experience, correct: reviewing is hard, debugging is hard, understanding intent is hard, maintaining shared mental models is hard, and making judgment calls is hard. His second point lands even harder:</p>

<blockquote>
  <p>“The marginal cost of adding new software is approaching zero.<br />
But the price of understanding, testing, and trusting that code? Higher than ever.”</p>
</blockquote>

<p>I feel this. I can ship more working code in an afternoon than I used to ship in a week, but verifying it, reasoning about its behavior, and protecting the shape of the system hasn’t gotten cheaper. If anything, it has expanded the amount of code I’m implicitly responsible for.</p>

<h2 id="llms-as-connective-tissue-not-output-machines">LLMs as connective tissue, not output machines</h2>

<p>That said, the work around these bottlenecks has changed. The tools are not solving the hard parts, but they make it easier to reach them. Over the last six months, I have been surprised by how helpful LLM-based tools are as navigational aids rather than generators. They each fill a different gap. <a href="https://www.greptile.com/">Greptile</a> gives me a second look at my own PRs and catches high-level issues that are easy to miss once you have been staring at a diff for too long. <a href="https://0github.com/">0github</a> is now my starting point when I review someone else’s changes; its heat-map diff and “risk” slider point me to the sections that deserve real attention. And I have been leveraging Claude Code as a way to understand new parts of the codebase. When I explore a new subsystem, I will ask it to outline the key files and invariants so I can go straight to the important pieces instead of wandering through the directory tree.</p>

<p>The common thread in all of these tools is simple: they cut down the time it takes to gather context. I spend less energy on the overhead and more on the actual reasoning. They do not replace the difficult parts, but they make it easier to get to them. That was the part I did not expect — the bottlenecks stayed where they were, but the road leading to them became much smoother.</p>

<h2 id="naming-the-tension">Naming the tension</h2>

<p>The smoother road comes with its own tradeoffs, though. After I posted last time, my dad wrote me a long note that helped me put words to the thing I had been circling. He is a <a href="https://www.colorado.edu/ebio/andrew-martin">professor of evolutionary biology</a>, and his message was very him: part philosophy, part evolutionary metaphor. He sent a series of questions and observations:</p>

<blockquote>
  <p>“Where is the balance?<br />
Is there a shifting baseline?<br />
Drift is easy; drift erodes capacity.<br />
You’re describing a moving human–machine interface.<br />
Being intentional about what you want to learn is a daily practice.<br />
This is an adaptationist mindset.”</p>
</blockquote>

<p>That last line made the whole thing click. If the environment is shifting under my feet, then my habits have to adapt with it. The tools make it easier to reach the real work, but they also make it easier to skip the parts that build intuition. The moment I let them shortcut the reasoning, the learning curve flattens. Drift is quiet at first, then it accelerates, and once it starts, it is hard to undo.</p>

<h2 id="what-responsible-use-looks-like-for-me">What responsible use looks like for me</h2>

<p>All of this raised a practical question for me: if the road to the bottlenecks is smoother, how do I make sure I am still doing the part of the work that actually builds skill? That is where I started drawing a line for myself. I use LLMs to accelerate navigation, mapping, summarization, risk-surfacing, tracing, onboarding, interpreting test failures, and spotting suspicious patterns. In other words, anything that helps me figure out where to look and what deserves attention. I avoid using them to speed up correctness, design, invariants, architecture, root-cause debugging, or tradeoff decisions. Those parts only improve through repetition and deliberate attention. I am not chasing purity here; I just do not want to weaken the muscles that matter. If a tool helps shrink the search space, I am happy to use it. If it tempts me to ignore the search space entirely, that is where I step back.</p>

<h2 id="the-way-has-never-felt-faster">The way has never felt faster</h2>

<p>I agree with Ordep: writing code was never the bottleneck, and it still isn’t. Understanding, reviewing, coordinating, and verifying remain the real constraints, and they still require a human brain fully switched on. What feels new is that LLMs finally help with the work around those constraints, the parsing and mapping and sense-making that used to take most of my energy. They make the bottleneck easier to see and easier to reach, even though they don’t change what happens once I’m there. They don’t lower the cost of judgment, but they lower the cost of arriving at the moment where judgment is required. If I can keep adaptation ahead of drift, it improves more than my output; it improves the way I approach problems. The constraints have not moved, but I get to them sooner and with less wandering. The pace has changed. We are not slouching toward the next bottleneck; we are moving straight into it, and I have to decide whether I am meeting that speed on purpose or just being carried along.</p>]]></content><author><name>Dylan</name></author><category term="ai" /><category term="reflection" /><category term="work" /><summary type="html"><![CDATA[After I published Spinning Plates, my old coworker Daniel left a comment I couldn’t stop thinking about:]]></summary></entry></feed>