<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The AI Runtime: Tools & Workflows]]></title><description><![CDATA[Hands-on reviews and walkthroughs of the tools AI engineers actually use day to day. Claude Code, LangGraph, vector databases, IDE copilots, MCP servers, and more. Every post includes what worked, what didn't, and whether it's worth your time. No sponsored fluff — just honest takes from someone who tried it.]]></description><link>https://theairuntime.com/s/tools-and-workflows</link><image><url>https://theairuntime.com/img/substack.png</url><title>The AI Runtime: Tools &amp; Workflows</title><link>https://theairuntime.com/s/tools-and-workflows</link></image><generator>Substack</generator><lastBuildDate>Sat, 09 May 2026 10:20:44 GMT</lastBuildDate><atom:link href="https://theairuntime.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Kranthi Manchikanti]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[aiengineerweekly@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[aiengineerweekly@substack.com]]></itunes:email><itunes:name><![CDATA[The AI Runtime]]></itunes:name></itunes:owner><itunes:author><![CDATA[The AI Runtime]]></itunes:author><googleplay:owner><![CDATA[aiengineerweekly@substack.com]]></googleplay:owner><googleplay:email><![CDATA[aiengineerweekly@substack.com]]></googleplay:email><googleplay:author><![CDATA[The AI Runtime]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Claude Code Is Becoming the Operating System for AI Engineering]]></title><description><![CDATA[The era of one-off prompts is ending. The teams pulling ahead are building systems: persistent memory, reusable skills, automated guardrails, and parallel agent workflows.]]></description><link>https://theairuntime.com/p/claude-code-is-becoming-the-operating</link><guid isPermaLink="false">https://theairuntime.com/p/claude-code-is-becoming-the-operating</guid><dc:creator><![CDATA[The AI Runtime]]></dc:creator><pubDate>Sun, 05 Apr 2026 19:09:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_BKP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a0c21e-8887-46cb-9d25-3db341645c40_397x995.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="pullquote"><p><strong>TL:DR</strong> - Claude Code is evolving from a coding assistant into a full operating system for AI engineering. The big shift is a four-layer setup: <code>CLAUDE.md</code> for persistent project memory, reusable skills for repeatable workflows, Auto Mode classifiers for governance, and parallel sub-agents for execution. Together, these layers reduce context loss, speed up shipping, and make agent workflows more reliable in production. The takeaway is simple: AI teams are moving beyond clever prompts and toward structured systems. The advantage now comes from building workflows with memory, guardrails, and specialized agent roles &#8212; not from using a single model in isolation. Engineers who can design and operate these stacks will be the ones with the biggest edge.</p></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://theairuntime.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>For the last year, most AI engineering has looked roughly the same: write a better prompt, paste more context, hope the model stays on track, and repeat when it drifts.</p><p>That model is breaking down.</p><p>What is replacing it is not just a better prompt stack, but a new operating model for building with AI. The strongest teams are no longer treating Claude Code like a chatbot that occasionally writes code. They are treating it like an operating system for engineering work &#8212; one that combines memory, tooling, governance, and coordinated execution into a repeatable production workflow.</p><p>At the center of that shift is a simple but powerful four-layer architecture.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_BKP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a0c21e-8887-46cb-9d25-3db341645c40_397x995.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_BKP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a0c21e-8887-46cb-9d25-3db341645c40_397x995.png 424w, https://substackcdn.com/image/fetch/$s_!_BKP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a0c21e-8887-46cb-9d25-3db341645c40_397x995.png 848w, https://substackcdn.com/image/fetch/$s_!_BKP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a0c21e-8887-46cb-9d25-3db341645c40_397x995.png 1272w, https://substackcdn.com/image/fetch/$s_!_BKP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a0c21e-8887-46cb-9d25-3db341645c40_397x995.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_BKP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a0c21e-8887-46cb-9d25-3db341645c40_397x995.png" width="397" height="995" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c1a0c21e-8887-46cb-9d25-3db341645c40_397x995.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:995,&quot;width&quot;:397,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:172451,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aiengineerweekly.substack.com/i/193230273?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a0c21e-8887-46cb-9d25-3db341645c40_397x995.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_BKP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a0c21e-8887-46cb-9d25-3db341645c40_397x995.png 424w, https://substackcdn.com/image/fetch/$s_!_BKP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a0c21e-8887-46cb-9d25-3db341645c40_397x995.png 848w, https://substackcdn.com/image/fetch/$s_!_BKP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a0c21e-8887-46cb-9d25-3db341645c40_397x995.png 1272w, https://substackcdn.com/image/fetch/$s_!_BKP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a0c21e-8887-46cb-9d25-3db341645c40_397x995.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The Emerging stack...</figcaption></figure></div><p>The first layer is <strong>persistent context</strong>. In this model, every project lives inside a single <code>CLAUDE.md</code> file that acts as shared memory for the system: goals, architecture decisions, current tasks, technical constraints, and the latest working state. Instead of re-explaining the project on every run, the agent starts with a living source of truth. That changes the workflow from &#8220;re-prompting&#8221; to &#8220;continuing.&#8221; Context stops being disposable and starts becoming infrastructure.</p><p>The second layer is <strong>skills</strong>. Rather than rebuilding workflows from scratch for testing, security review, UI generation, documentation, or SEO, teams are packaging them into reusable tool packs. The advantage is not just speed. It is consistency. Once a skill is defined well, it becomes an asset the whole team can use again and again without reinventing process every week.</p><p>The third layer is <strong>governance</strong> &#8212; and this is where the stack gets serious. The old permission model created friction at exactly the wrong moments: too many interruptions for safe actions, not enough structure for risky ones. The emerging answer is Auto Mode classifiers. Before a tool call runs, a lightweight rule layer decides whether the action should proceed automatically, request approval, or be blocked altogether. In practice, that means sensitive file writes can trigger review, sandboxed execution can happen automatically, and trusted external calls can move without slowing the whole workflow down. Governance stops being a bottleneck and becomes an enabler.</p><p>The fourth layer is <strong>parallel agents</strong>. This is the real leap. Instead of one model handling one giant prompt, teams are spinning up specialized sub-agents across product, engineering, QA, security, DevOps, and operations. These agents work in parallel, communicate through defined channels, and break larger projects into coordinated streams of execution. The result is not just faster output. It is a more realistic reflection of how high-performing teams already work &#8212; except now the coordination layer is automated.</p><p>Put those four layers together and the pattern becomes clear: memory, skills, guardrails, and agents. That is the new stack.</p><p>And it matters because it solves the biggest weakness in agent workflows today: fragility.</p><p>Most agent demos look impressive for five minutes. Real production work is different. It demands continuity, repeatability, safety, and the ability to hand work across functions without losing context. A single long prompt cannot do that reliably. A structured operating system can.</p><p>That is also why this conversation is moving beyond tooling and into careers. The market is no longer just rewarding people who can &#8220;use AI.&#8221; It is rewarding people who can design systems around AI: persistent project memory, governed execution, multi-agent orchestration, and measurable operational gains. The differentiator is shifting from prompt cleverness to systems thinking.</p><p><strong>So what should builders do now?</strong></p><p>Start simple. Create a <code>CLAUDE.md</code> file for your current project and treat it like operational memory, not documentation. Add a small set of reusable skills for the tasks you do every week. Introduce classifier-based rules for anything that touches sensitive files, external systems, or code execution. Then graduate from a single-agent workflow to a parallel team structure where each agent has a clear role and bounded responsibility.</p><p><strong>This is the bigger takeaway:</strong> the winning teams in AI engineering will not be the ones with the flashiest demos. They will be the ones with the best operating systems.</p><p>The prompt was only the beginning. The stack is the future.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://theairuntime.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[What Actually Happens When You Type claude in Your Terminal]]></title><description><![CDATA[Internals of Claude Code]]></description><link>https://theairuntime.com/p/what-actually-happens-when-you-type</link><guid isPermaLink="false">https://theairuntime.com/p/what-actually-happens-when-you-type</guid><dc:creator><![CDATA[The AI Runtime]]></dc:creator><pubDate>Fri, 20 Mar 2026 02:37:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_HIh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8cc6a3-3c52-422a-9af4-38e3d72f09b4_946x446.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You open a terminal, type <code>claude</code>, and press Enter. Within seconds, a cursor blinks, ready for your prompt. It feels instant.</p><p>But between your keystroke and that cursor, Claude Code executes an intricate startup sequence &#8212; authenticating, scanning your filesystem, loading memory, connecting to MCP servers, constructing a system prompt, and pre-caching tokens for an API call that hasn&#8217;t happened yet.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://theairuntime.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Here&#8217;s everything that happens behind the scenes, and what it costs you.</p><div><hr></div><h2>Phase 1: Authentication</h2><p>Claude Code checks for credentials in order: <code>ANTHROPIC_API_KEY</code> environment variable first, then OAuth session (from <code>claude login</code>), then Bedrock/Vertex/Azure credentials for enterprise users.</p><p>This step determines your billing pathway. API keys charge per-token (For example, $5/$25 per MTok for Opus 4.6, $3/$15 for Sonnet 4.6). Pro ($20/mo) and Max ($100/mo) subscribers have usage included.</p><h2>Phase 2: The Configuration Sweep</h2><p>Claude Code walks the filesystem to find every applicable CLAUDE.md file. Loading order, from broadest to most specific:</p><ol><li><p><strong>Enterprise managed policy</strong> &#8212; org-level rules from IT admins</p></li><li><p><strong>User-level</strong> (<code>~/.claude/CLAUDE.md</code>) &#8212; your personal defaults</p></li><li><p><strong>Project-level</strong> (<code>.claude/CLAUDE.md</code>) &#8212; team config, committed to repo</p></li><li><p><strong>Directory-level</strong> (<code>CLAUDE.md</code> in working dir) &#8212; scoped overrides</p></li><li><p><strong>@import references</strong> &#8212; modular includes from any CLAUDE.md</p></li><li><p><code>.claude/rules/</code> &#8212; topic-specific rule files</p></li></ol><p><strong>The precedence rule:</strong> more specific always wins. Directory overrides project overrides user.</p><p>One important asymmetry: files <em>above</em> your working directory load in full at startup. Files in <em>child</em> directories load on demand. A monorepo with 50 subdirectories won&#8217;t bloat your initial context.</p><h2>Phase 3: Memory Loads</h2><p>After configuration, Claude Code loads its memory system &#8212; separate from CLAUDE.md.</p><p><strong>Auto memory</strong> lives in <code>MEMORY.md</code>. When you correct Claude or establish patterns, it can save learnings here. But here&#8217;s the critical detail most people miss:</p><blockquote><p><strong>Only the first 200 lines of MEMORY.md are loaded at session start.</strong> Topic files are read on demand. This cap keeps initial context lean.</p></blockquote><p><strong>Session storage</strong> saves every message, tool use, and result to disk. This enables <code>--resume</code> (pick up where you left off), <code>--fork-session</code> (branch for parallel exploration), and rewind (undo to any point). Sessions are tied to your working directory.</p><h2>Phase 4: Tools and Extensions Register</h2><p>Six built-in tools are always available:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_HIh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8cc6a3-3c52-422a-9af4-38e3d72f09b4_946x446.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_HIh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8cc6a3-3c52-422a-9af4-38e3d72f09b4_946x446.png 424w, https://substackcdn.com/image/fetch/$s_!_HIh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8cc6a3-3c52-422a-9af4-38e3d72f09b4_946x446.png 848w, https://substackcdn.com/image/fetch/$s_!_HIh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8cc6a3-3c52-422a-9af4-38e3d72f09b4_946x446.png 1272w, https://substackcdn.com/image/fetch/$s_!_HIh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8cc6a3-3c52-422a-9af4-38e3d72f09b4_946x446.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_HIh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8cc6a3-3c52-422a-9af4-38e3d72f09b4_946x446.png" width="946" height="446" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd8cc6a3-3c52-422a-9af4-38e3d72f09b4_946x446.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:446,&quot;width&quot;:946,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:61207,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiengineerweekly.substack.com/i/191190345?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8cc6a3-3c52-422a-9af4-38e3d72f09b4_946x446.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_HIh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8cc6a3-3c52-422a-9af4-38e3d72f09b4_946x446.png 424w, https://substackcdn.com/image/fetch/$s_!_HIh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8cc6a3-3c52-422a-9af4-38e3d72f09b4_946x446.png 848w, https://substackcdn.com/image/fetch/$s_!_HIh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8cc6a3-3c52-422a-9af4-38e3d72f09b4_946x446.png 1272w, https://substackcdn.com/image/fetch/$s_!_HIh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8cc6a3-3c52-422a-9af4-38e3d72f09b4_946x446.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>80% of your context consumption comes from file reads and tool results, not your messages.</strong> A 500-line file costs ~4,000 tokens. This is why <code>Grep &#8594; Read</code> (targeted) beats <code>Read</code> (entire file) for cost.</p><p>If you have <strong>MCP servers</strong> configured (<code>.mcp.json</code> for project, <code>~/.claude.json</code> for personal), they connect now. Each server&#8217;s tool definitions get added to every API request.</p><p><strong>Skills</strong> (<code>.claude/skills/</code>) load only their metadata (name + description, ~100 words each). The full skill body loads on demand when triggered. Progressive disclosure.</p><h2>Phase 5: System Prompt Assembly</h2><p>This is where cost starts accumulating. Claude Code concatenates everything into a system prompt:</p><ul><li><p>Core identity instructions (~2K-4K tokens)</p></li><li><p>All CLAUDE.md content (~500-5K tokens)</p></li><li><p>First 200 lines of MEMORY.md (~200-1.5K tokens)</p></li><li><p>Tool definitions (~3K-7K tokens)</p></li><li><p>Skill metadata (~100-500 tokens)</p></li></ul><p><strong>Total: 6,000&#8211;18,000 tokens before you type a word.</strong></p><p>Here&#8217;s why this matters: <strong>the system prompt is sent with EVERY API request.</strong> If you make 40 tool-use turns, that&#8217;s up to 720K tokens just from system prompt repetition.</p><p><strong>Prompt caching saves you.</strong> Claude Code automatically caches the system prompt. After the first request, subsequent sends cost only 10% of standard input price. This is the single most impactful cost optimization built into Claude Code, and it&#8217;s automatic.</p><h2>Phase 6: You Type &#8212; The Loop Begins</h2><p>Your first message triggers the first API call. Then the agentic loop takes over:</p><pre><code><code>You send message
  &#8594; Claude decides: respond or use a tool?
    &#8594; If stop_reason = "tool_use": execute tool, append result, send AGAIN
    &#8594; If stop_reason = "end_turn": display response, wait for next input
</code></code></pre><p><strong>The compounding cost of turns:</strong> every turn resends the ENTIRE conversation history. Turn 1 might send 10K tokens. Turn 30 might send 180K. This is linear growth. Prompt caching softens it for repeated content, but unique tool outputs aren&#8217;t cacheable.</p><p>When context hits ~80-90% capacity, <strong>auto-compaction</strong> fires &#8212; summarizing earlier turns and discarding raw history. This is lossy. Critical details from early in the conversation can be lost. For important state, persist it to files Claude can re-read.</p><h2>The Practitioner&#8217;s Cheat Sheet</h2><p><strong>Before the session:</strong></p><ul><li><p>Keep CLAUDE.md under 200 lines &#8212; every line enters the system prompt on every turn</p></li><li><p>Use <code>.claude/rules/</code> for modularity instead of one massive file</p></li><li><p>Never hardcode secrets in <code>.mcp.json</code> &#8212; use env var expansion</p></li></ul><p><strong>During the session:</strong></p><ul><li><p><code>/clear</code> between unrelated tasks &#8212; stale context costs real money</p></li><li><p>Use Grep before Read &#8212; 20 matching lines vs 8,000 tokens for a full file</p></li><li><p><code>Shift+Tab</code> for Plan mode &#8212; reduces token consumption 40-60% on complex tasks</p></li><li><p><code>/model sonnet</code> for routine work &#8212; cheaper than Opus</p></li><li><p><code>/cost</code> to check token usage</p></li></ul><p><strong>After the session:</strong></p><ul><li><p><code>/rename</code> before <code>/clear</code> so you can <code>--resume</code> later</p></li><li><p>Prune MEMORY.md periodically &#8212; stale memories waste tokens</p></li></ul><h2>What It Costs</h2><p>Model Input Output Opus 4.6 $5/MTok $25/MTok Sonnet 4.6 $3/MTok $15/MTok Haiku 4.5 $1/MTok $5/MTok Batch API 50% off 50% off</p><p>Average: <strong>~$6/developer/day</strong> for API users. </p><p>Cache reads cost 10% of input price. This is why prompt caching matters so much: your 5K-token system prompt, resent 40 times, costs $1.00 without caching or $0.12 with it.</p><div><hr></div><h2>The Full Lifecycle</h2><pre><code><code>$ claude
&#9474;
&#9500;&#9472; 1. Authenticate (API key / OAuth / Bedrock / Vertex)
&#9500;&#9472; 2. Load CLAUDE.md hierarchy (user &#8594; project &#8594; directory)
&#9500;&#9472; 3. Load auto memory (first 200 lines of MEMORY.md)
&#9500;&#9472; 4. Connect MCP servers
&#9500;&#9472; 5. Register tools + skill metadata
&#9500;&#9472; 6. Assemble system prompt + apply cache markers
&#9500;&#9472; 7. Display cursor &#8212; waiting for input

&#9500;&#9472; 8. You type a message
&#9500;&#9472; 9. API request: system prompt + tools + message
&#9500;&#9472; 10. Claude responds (text or tool_use)
&#9500;&#9472; 11. If tool_use &#8594; execute &#8594; append &#8594; send again
&#9500;&#9472; 12. Loop until stop_reason === "end_turn"
&#9500;&#9472; 13. Save turn to local session storage
&#9492;&#9472; 14. Wait for next input
</code></code></pre><p>Claude Code isn&#8217;t a chatbot with a terminal wrapper. It&#8217;s an agentic system managing authentication, configuration layering, memory persistence, tool orchestration, and context optimization on every session.</p><p>The most impactful optimizations are the simplest: lean CLAUDE.md, <code>/clear</code> between tasks, Grep before Read, and letting prompt caching do its job.</p><p>Now go type <code>claude</code> &#8212; and this time, you&#8217;ll know exactly what happens.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://theairuntime.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>