AGENT

What is an AI agent, actually.

The term has meant four different things in eighteen months. Here is what separates a working agent from a demo.

Mia is our AI employee. Email her — she’ll book your 15-minute call. That’s the demo.

← REBOTIFY WRITINGBY · · 6 MIN READ

The term “AI agent” has been used to mean four different things in the last eighteen months. An assistant that talks. A copilot that suggests. A workflow automation that runs on a schedule. A loop that plans, acts, recovers, and retries. Vendors use the word to mean whichever shape fits their product. If you are trying to decide whether to hire one, that ambiguity costs you time.

Here is the operational definition that has survived ten years of shipping agents into paying customer stacks: an agent is a loop that reads a queue, plans a sequence of actions, executes against real systems, checks whether the work landed, and retries or escalates if it did not.

That loop is not magic. It is plumbing. And the plumbing is where the demos break and the production systems hold.

Start with the shapes people conflate.

An assistant is a thing you talk to. ChatGPT, Claude, Google Gemini. You ask a question. It reasons and replies. The loop is: you ask, it thinks, it answers. The customer owns the clock. The assistant owns the response. There is no action, no integration, no consequence beyond the conversation.

A copilot is an assistant in a tool you already use. Gmail draft suggestions. GitHub code completions. Figma layout ideas. It sits in the workflow, suggests next steps, and lets you accept or reject. You still own the decision. It owns the suggestion. Action is optional.

A workflow automation is a rule you program once. If a ticket arrives with tag X, create a Slack thread and assign it to team Y on Wednesday at 9am. No reasoning. A decision tree with maybe some intent classification on top. If the rule does not cover the case, a human gets the escalation.

An agent is different. It reads the queue, thinks about what needs to happen, talks to your systems, checks if the work actually landed, and tries again if it did not. The loop is: read context, plan sequence, execute actions, observe outcome, retry or escalate. There is reasoning, action, and recovery. A human does not need to pre-program every edge case. The agent figures out what the edge case is and either handles it or flags it.

That loop looks simple until you try to ship it.

The agent loop in five boring steps.

  • Read: the agent pulls context from somewhere. An email, a ticket queue, a contract on a desk, a database of leads. This is the first place demos work and production breaks. The demo has perfect context. The production queue is noisy, incomplete, contradictory.
  • Plan: the agent thinks about what work needs to happen. What is this email about? What decision follows? What tools do I need to call? What is the sequence? This is cheap and fast until the agent has to reason across contexts it has not seen before, or when the reasoning horizon is too wide and the model starts to lose the thread.
  • Act: the agent calls a tool. Makes an API call. Logs into a CRM. Drafts an email. Moves a file. Sends a message. Anything that changes the state of the customer’s world. This is where scope matters. The more tools the agent has, the longer the loop takes, and the more ways it can fail.
  • Observe: the agent checks whether the action worked. Did the email send? Did the record save? Did the API reject the payload? Observation is not built in. Most agents skip it. This is why most agents break in production.
  • Retry or escalate: if the action succeeded, the loop closes and the agent moves to the next queue item. If the action failed, the agent either tries again or admits it is stuck and sends it to a human. Most agents do not retry well. They do not know when to give up. They do not know how to escalate so a human can understand what went wrong.

This loop is where the marketing ends and the honesty begins.

A vendor demo runs this loop once, on clean data, with happy-path tools, with fresh API credentials, with a model that has been prompted and re-prompted until the one thing it does works. The model reasons well. The APIs reply fast. The human review is a nice-to-have.

Production is different. The queue is long. The context is incomplete. The tools have rate limits. The OAuth tokens expire on schedule. The model gets confused on edge cases. The APIs return errors the agent was never trained on. The 1% of cases where the agent gets it wrong is the 1% the customer sees. And if the agent sends something broken without a human checking first, that 1% breaks the relationship.

A working agent needs four things the vendor does not talk about. The first is state. The agent needs to remember what it tried, what worked, what failed, in a vault the customer owns that persists across model changes. The second is an approval gate. Anything that touches a customer, a contract, or a commitment needs a human to review before it ships. Not because the agent is unreliable. Because the things it gets right 99% of the time will, on the 1%, have blast radius. The third is an actual integration into the customer’s stack, not a REST call to an API the vendor controls. And the fourth is operational visibility. When does the agent pause? When does it retry? When does it escalate? If something went wrong three days ago and the customer did not notice, what was supposed to tell you?

The agents that break in production are the ones that skip one of these four.

The honest version is simpler than the marketing: agents fail when they lose context. When they cannot see what a human saw. When they try an action against a system they do not understand. When the approval queue gets too long and a human starts rubber-stamping. When observability is so thin that a failure on Friday afternoon is not caught until Monday. When the team has no way to teach the agent what it got wrong so it does not do it again.

An agent we ran inside an insurance brokerage last year hit a wall on edge cases. A claim was misread because the model missed a policy rider. The reviewer caught it. Three cases later, the team had retrained the agent on what a rider actually signals. The vault made the fix permanent. An agent without a vault would have made the same mistake every week. The difference was not the agent. It was the memory.

Another one, inside a law firm, started sending drafts that were technically correct but toned wrong for the client relationship. Polite. Thorough. Missing the specific voice the partner was known for. The agent was not hallucinating. It was depersonalised. The fix was a vault of two years of the partner’s correspondence: his tone, his emphasis, his shortcuts. The model learned the voice by reading it. Once it did, the drafts matched the person.

The pattern is: agents fail when they lose information. They lose it because the infrastructure is not built to keep it.

This is what it looks like when all four pieces land.

An operations team at a financial services firm has an agent they call Morgan. Morgan lives in their inbox. Every morning, Morgan reads the overnight queue: customer emails, flagged issues, escalations, requests that came in while the team was offline. Morgan reads the subject and the history. Morgan plans: which emails need a reply, which need a file lookup, which need a supervisor. Morgan drafts a response if it is straightforward. Logs into the CRM if there is account history to check. Pulls a template if the category is routine. Then Morgan checks. Did the reply land? Did the lookup come back? Did the rate limit throttle? If something breaks, Morgan retries, or flags it.

Then Morgan waits. The reply sits in a draft queue. A human on Morgan’s team reads it. Two seconds. Most are clean. Once a week, something is flagged wrong, a decision that should have escalated, a tone that missed the mark. The human fixes the draft. Sends it. Updates the vault. Morgan learns.

This is the boring version of the agent loop that vendors skip in the keynote. Read, plan, act, observe, review, learn. Every day. Every week the queue is a little smaller because Morgan understands the voice a little better. After three months Morgan handles 80% of the overnight queue without human intervention. After six months the human on the team is spending 20% of their time on the queue and getting 90% of the credit.

A lot of the work a business cares about does not need an agent loop. A weekly report that always runs the same query does not need reasoning. It needs a cron job. An email that goes to the same three people every time does not need planning. It needs a template. A form that comes in from a customer does not need observation and retry. It needs a webhook.

The agent is the right shape when the work is ad hoc and the context is wide. When you cannot predict the sequence of actions because every queue item is a little different. When the human cost of reading the queue and making a plan is higher than the cost of letting the agent think. When you have time to build the vault, train the approvers, and watch the loop get faster.

The question that matters is not is this AI agent technology impressive. It is: what context does this agent need? Where should a human look before this ships? What should fire to tell me it is broken? Answer those and you have something real. Skip them and you have a demo that will break three weeks in.

Related See where we put agents to work

48-HOUR START

Ready to put an agent to work? Start with one job, we will ship it in forty-eight hours.

Email Mia

Mia is our AI employee. Email her — she’ll book your 15-minute call. That’s the demo.