Most of our prospects could rebuild what we ship for them. They have Claude Code, decent product sense, and a free Saturday. The question they are actually asking — once you get past the small-talk version of it — is not "is this possible." It is "who is accountable when this breaks at 2am, and who keeps the playbook current as my business changes." Pretending the first question is the real one is the kind of bad positioning that loses deals on both ends: the prospect who could DIY ends up overpaying, the prospect who shouldn't DIY ends up trying anyway and bouncing back six months later having lost the time.
I want to write what I actually think about this, because the agency-and-services category is reorganizing fast and most of the public takes are either "AI is going to kill consulting" or "AI doesn't change anything for real services." Both miss what is happening. The honest version is in the middle and is more useful to operate on.
What Claude Code is actually good at
I want to start here because the rest of the post collapses if I am dishonest about this. Claude Code in April 2026 is genuinely good at most of the work an agency used to charge $5k–$20k for. Specifically:
- Standing up the boring infrastructure — Astro site, Sanity schema, n8n flows, basic API integrations. The wiring that used to be a four-week engagement is now closer to a two-day project for a competent operator who knows what they want.
- Reading and modifying existing code — debugging a stuck integration, adding a field to a schema, refactoring a workflow. The "I already have a system, can you change one thing" job has compressed dramatically.
- First-pass implementations of well-defined patterns — anything where the brief is "build the standard version of X." Cold email infrastructure, basic CRM hygiene scripts, simple AI agents that do one job.
If you have read any of our knowledge content and thought "I could probably build this myself in a weekend," you are right. The architecture is not the moat. We publish the architecture on purpose. What the post does not contain — the per-vertical playbook, the prompts that get tuned weekly against real reply data, the judgment about what to skip — is what we sell.
What Claude Code is not good at (in 2026)
The list shifts every quarter as the underlying model improves, but the structural gaps are stable enough to plan around.
- Judgment about what to build in the first place — Claude Code will happily build whatever you ask. Whether you should be building that thing, given your stage, your team, and your actual unit economics, is the harder question. Most failed AI implementations we audit failed at this layer, not the wiring.
- Maintenance over time — your Saturday-built agent works on Saturday. Six weeks later Meta changes their API surface, your CRM ships a new schema migration, your inbound volume shifts, the prompt that worked at 200 emails a week starts hallucinating at 800. Someone has to notice and fix. That someone is the actual job.
- Edge cases that look like edge cases until they are 30% of your traffic — the customer who replies in a different language, the deal stuck because the champion left, the invoice that needs special tax treatment. AI handles the happy path. Operators handle everything else, and "everything else" compounds faster than people expect.
- Accountability — when the inbox triage agent miscategorizes an investor email and you miss a wire deadline, the question "who is responsible" has only one good answer. Outsourcing the build does not outsource the accountability. Hiring a service does.
The pattern across all four: they are not technical problems. They are operational problems that show up after the build is done. Most agency content frames the agency value as "we know how to build it." That was true in 2022. In 2026, the build is the cheap part. The expensive part is what comes next.
A concrete example: inbox triage
To make this less abstract, take one of the surfaces in our AI business operations post — inbox triage. The architecture is roughly four boxes: an IMAP listener, a Claude prompt that classifies the email, a routing rule that files or drafts based on the classification, a log layer.
Anyone with Claude Code and a weekend can build this. We did, in fact, build the first version of ours over a weekend. The post you are reading was not stopped by the build.
What stopped it was everything that happened after the build:
- Week three — the classifier started miscategorizing automated transactional emails (Stripe webhooks, GitHub notifications) as "investor outreach" because the model was mildly hallucinating organization names. Fix: a pre-filter on sender domain and an explicit category for known transactional sources. Half a day of work, plus the embarrassment of mis-routed alerts in the meantime.
- Week six — Anthropic shipped Claude 4.7 and our prompt that was tuned against 4.6 started over-confidently classifying ambiguous emails. Fix: re-tune the prompt with a fresh batch of edge cases, add a confidence threshold below which the agent files for human review instead of acting. Another half-day, plus the engagement-cost of every email I had to manually re-route while it drifted.
- Week nine — sales reply volume tripled after a content campaign and the cost of the agent jumped 4x. Fix: cache classifications for known senders, batch low-priority emails for nightly processing instead of real-time. A small architectural change, but one I had to think through.
- Week thirteen — a customer replied to a 4-month-old thread we had archived, the agent treated it as a fresh inbound, and we missed a real escalation for 18 hours. Fix: add a "recent thread" lookup before classification. Two hours of work plus a refund call.
Each of those fixes was small. None of them was hard. All of them required someone watching, noticing, and caring enough to fix the boring middle. None of them are in the architecture diagram. None of them would be in a Claude Code prompt that says "build me an inbox triage agent."
The inbox triage agent is not a project. It is a relationship. Most teams that DIY the build underestimate the relationship part by an order of magnitude.
The real product is the maintenance loop
Once I started naming this honestly to prospects, the conversations got easier. The pitch is no longer "we build AI for you." It is "we operate the AI you need, and the build is the cheapest part of that work."
Concretely, what we sell that Claude Code does not provide:
- A weekly review of what the agent did — every miscategorization, every drift, every edge case that surfaced. We do not just deploy. We watch, we tune, we report.
- A versioned playbook per client — the prompt, the rules, the routing logic are not "one prompt forever." They are a living document we update as the business changes. We track what changed and why, in a place you can read.
- Vendor-shift insulation — when Anthropic ships a new model, when Meta changes their API, when Google rolls out a new ad-platform feature, we have already tested the impact on your stack and adjusted. You do not get pinged.
- Accountability — there is one phone number to call when something breaks, and that number is not yours.
The first three are the work. The fourth is the reason anyone pays for the first three at scale.
When you should not hire us
The honest cases where DIY is correct:
- You enjoy building — if you would rather spend the weekend wiring an n8n flow than playing with your kids or watching a movie, the cost of DIY is approximately zero. Build it. Have fun.
- The system is not load-bearing — if the inbox triage agent goes down for two weeks and your business does not notice, you do not need a service relationship around it. Ship the weekend version, accept the occasional drift.
- You are pre-product-market-fit — when the business itself is changing every six weeks, locking in a paid service to operate part of it adds friction more than leverage. Wait until you have a stable shape worth investing in.
- You have an in-house operator who treats this as their actual job — if there is someone whose role is "operate our AI stack" and they are good at it, hiring an external service is duplicative.
In every other case, the math on hiring usually works. The break-even hits earlier than people expect because the cost of bad maintenance is mostly invisible until it bites.
When you should hire us
The shape of an ideal client engagement on our side:
- AI is going to be load-bearing in the business within the next 90 days — meaning the system, if it breaks, costs real money. Customer support deflection. Sales pipeline triage. Ad-spend management. These are not "nice to have if it works."
- Founder or operator does not enjoy maintenance — they want the system running, they do not want to be the one paged at 2am when it stops. They want to spend their attention on the parts of the business only they can do.
- Volume justifies the relationship — past a certain scale, the cost of mediocre operations exceeds the cost of paying someone to operate it well. Roughly: if any single AI-touched workflow handles more than a few thousand events a month, the math is on the service side.
- The business has stable enough shape that the playbook compounds — early-stage chaos is fine; we work with founders. But there has to be enough operational stability that what we tune this month is still relevant next quarter.
What the agency category is becoming
My honest read of the next two years: the agency-and-services category is splitting into two halves. The half that was selling "we build it for you" gets compressed hard. The half that is selling "we run it for you, and the build is one cheap line item inside the run" stays healthy. We are deliberately positioned in the second half.
This is not a defensive crouch about AI tooling. The shift is mostly good. Builds being cheap means more clients can afford to deploy more AI in more places, which means there is more for the operations layer to operate. The agencies that try to hold the old "we charge for the build" pricing are going to get out-priced by Claude Code; the ones that reframe around the operational layer have more demand than they did three years ago, not less.
If you read this and decide DIY is right for you — good. The free advice in our knowledge content is real, and we publish it on purpose. If you read this and the 2am question lands differently, book a call. We do not need to convince you that the build is hard. We need to be honest that the build is the cheap part. Everything else is the work.
