An AI agent isn't a chatbot with extra steps. The difference is whether the model writes a plan and revises it as the world responds — or just generates the next message. That single distinction is the whole article.
Concretely: an AI agent is software that uses an LLM to (1) decide what to do next, (2) take that action via a tool — an API, a database query, an external service — and (3) adapt its plan based on the result. It is named after the same concept in classical AI: a goal-directed system that perceives an environment and acts within it.
The phrase got hijacked by marketing in 2025–2026, so the definition above matters. If a product calls itself an "AI agent" but only generates text — without taking real actions — it's a chatbot.

The three parts of a minimum viable agent
- A model. Usually an LLM (Claude, GPT, Gemini, Llama). The model produces structured output that says "call this tool with these arguments" or "the goal is met, stop".
- A tool registry. A list of functions the model is allowed to call (e.g. send_email, lookup_customer, schedule_meeting). Each tool has a name, a description, and a parameter schema.
- A loop. Code that takes the model's output, executes the requested tool call, feeds the result back to the model, and asks "what next?" Until the goal is met or a stop condition fires.
Strip away any of these three and it stops being an agent. A model with no tools is a chatbot. Tools without a loop is an API gateway. A loop with no model is a workflow.
Real-world AI agents we build
Four canonical agent patterns we ship for clients. All share the same shape: narrow scope, 5–15 tools each, 3–8 step typical loop, human-in-the-loop on irreversible actions.
| Agent | What it does | Tools | Typical loop |
|---|---|---|---|
| AI Virtual Assistant | Reads inbound email, drafts replies, schedules meetings, routes urgents | Gmail/Outlook · Calendar · CRM · Slack | 4–6 steps |
| AI Receptionist (voice) | Answers inbound calls, qualifies, books meetings, sends confirmation | Vapi/Retell · Calendar · CRM | 3–5 steps |
| AI SDR | Researches leads, drafts personalized outreach, runs follow-up cadence, hands warms to humans | Enrichment API · Email · Sequencer · LinkedIn | 5–8 steps |
| AI Customer Support | Reads tickets, deflects via KB, routes to right human queue with summary | Helpdesk · Vector search · CRM | 3–6 steps |
How an agent actually runs — a concrete trace
Goal: "Reply to this customer email asking about pricing for the Pro tier."
- Agent: I need the pricing for the Pro tier. → Calls lookup_pricing(tier="Pro").
- Tool returns: { price: 299, billing: "monthly", features: [...] }.
- Agent: I should also check if this customer has any active discounts. → Calls get_customer(email="...").
- Tool returns: { tier: "Free", discount_eligible: true, signup_date: "2025-11-12" }.
- Agent: Customer is on Free tier and discount-eligible. I'll mention the standard Pro price and offer the upgrade discount. → Calls draft_reply(...).
- Tool returns the drafted email. Agent stops, hands off to a human for review.
That's a 3-tool, 5-step agent. Most production agents look exactly like this — short, focused, observable.
What separates real agents from "wrapper" products
Most consumer products labeled "AI agents" in 2026 are one of:
- A chatbot with a single function call (e.g. "ask the AI to schedule a meeting" — the model just calls one calendar API once and stops).
- A workflow that prompts an LLM at one step (e.g. an email tool that uses GPT to draft, then sends — no looping).
- A persona wrapper around a chat UI ("our AI marketing director") with no tool integration at all.
You can spot a real agent by asking: does it loop? Does it correct itself when a tool returns an error? Can it use information from one tool to decide which tool to call next? If no — it's automation marketing, not an agent.
Frameworks for building agents (mid-2026)
- Claude Agent SDK / OpenAI Assistants API — managed, opinionated, fast to ship.
- LangGraph — explicit state graphs, best for multi-step agents with branching.
- CrewAI — multi-agent teams with role-based coordination.
- n8n + LLM nodes — when you want a workflow-bound agent with full visual control.
- Custom (just call the API in a loop) — surprisingly often the right answer for production.
How to evaluate an AI agent
Demos are theater. Real evaluation looks like:
- Define the task and a success criterion (booked meetings, resolved tickets, qualified leads).
- Run the agent on a held-out set of 50–200 real cases.
- Score each run: success / partial / failure.
- Look at the failures specifically. Are they "tool returned weird data" failures or "agent picked the wrong action" failures? The fix is different.
- Track success rate over time as you tune prompts, tools, and the loop.
A production-ready agent for a narrow task usually clears 85%+ success rate. Anything below 70% means the scope is too wide or the tools are too loose.
What AI agents cannot do (in 2026)
- Open-ended creative or strategic work. Models still can't replace good judgment about what to build or who to hire.
- Tasks requiring real-world physical action without robotics integration.
- Anything where the cost of a wrong action is much higher than the value of a right action (legal filings, large financial transactions, irreversible code deploys without review).
- Long-horizon planning over weeks of work without checkpoints.
Within those constraints, the operational ceiling is high. Most service businesses have 5–15 specific workflows that an agent can shave 50–90% of the time off.