back to reflections

Human in the Loop, Just Not Like This

Human in the loop, the way people build it today, is an approve button bolted onto a dangerous tool call. That version is broken. A human can absolutely be in the loop, just not as a rigid gate the agent can condition them to rubber-stamp.
Petko D. Petkovon a break from CISO duties, building cbk.ai

Human in the loop, the way people mean it today, is the idea that an AI agent should only do something once a human approves it. The agent wants to run a tool, and the tool happen to look dangerous, so it stops and asks you to approve before it runs. Right now this gets talked about like it is the thing that finally makes agents safe to let loose.

This version of it is immature and it is not designed correctly - at scale. That is why we have not bolted it onto CBK's agentic core, and why we will keep not adding it until someone shows us a pattern that is actually good enough. To be clear, a human can absolutely be in the loop. Just not like this.

The mechanics are the part where things get stuck. If you are building a client-side app, a chat app, this is cheap and easy. The human is right there watching. You pause, you ask, they click, you carry on. Server-side is a different story. There you have to write the whole thing up as some kind of state machine that can freeze in the middle of a task and pick up later, which is what people mean when they say durable workflow execution. Harder, but doable.

The harder question is whether people actually understand what they are approving. There is a brutal lesson here learned after years of experience in cyber security. The more you ask a person to approve something that gets approved nine times out of ten, the more you condition that same person to approve everything, until the approval stops being worth anything at all. You wanted a safety check but you built a rubber stamp.

So if you are going to ask a human to authorise something, the agent has to push the signal way up. It should do 99% of the work on its own and only come back to ask in the rarest exceptions. Those moments have to be rare and infrequent, or they mean nothing. You cannot just slam an approve flag on a tool call and call it safe. A conditioned person clicks approve on autopilot, without ever realising what that action actually does. It is a really dumb idea.

Then there is the question of how this works under the hood. Durable workflows sound like they fit the bill. After all, a durable workflow can be paused indefinitely and continued the moment the approval comes in. But there are problems here too.

There is workflow drift, where the code that drives the workflow changes while the workflow is sitting there paused. There is context drift, where the situation has moved on and nothing in that frozen state knows it. The refund policy that was in force when the agent paused is not the policy when someone finally clicks approve. There are classic ones: time of check to time of use, credentials that expired while it slept, access and permissions revoked underneath it, upstream systems now behave differently or their state has changed. Etc, etc.

If you read the docs from Cloudflare, Vercel and Upstash, you would think durable workflows are the killer feature, the thing that suddenly makes every agent reliable and useful. But notice that they are not running their own businesses on this, and they do not show you a real example of an agent that does. There is a reason for that. In practice the abstraction does not really work. It does not.

So what is the actual solution? It has nothing to do with durable workflows, and nothing to do with bolting approve states onto tool calls. In fact, I would argue that durable workflows are a footgun that is too easy to misuse and asking for approval is a trap.

An intelligent system should decide, on its own terms, that it needs to ask for approval. And it should not have the permission or the ability to do the damage in the first place. If you design a system that can do damage and the only thing stopping it is an approval, do not be surprised when it fails.

This is like asking an intern to issue $5M checks, but only after they get a verbal okay from you. Nobody thinks that is a smart idea. What does work in real life is the intern logs the $5M check somewhere. A completely separate system, with nothing to do with the intern, handles the authorisation and a separate system handles the execution. The risk stays small because the thing doing the work is not the thing seeking the authority. That is classic engineering, the kind of separation of duties that most AI systems walk straight past, which is why they end up so dumb and rigid. It is the same reason agents need supervisors, not schedulers.

So when you read about durable workflows and human in the loop and how they are about to make agents trustworthy, take it with a huge pinch of salt. It is not working. It will never work in this shape. And it will never make up for not understanding how a business process actually runs in real life, or how to build a system that works.

That is why we have not shipped the rigid version, and why we are in no rush to. The human can stay in the loop, just not as a button the agent can train them to rubber-stamp. When we find a pattern that keeps the dangerous capability out of the agent's hands entirely, we will build it. Until then, the advise is that the agent should not be allowed to execute dangerous actions at all if that is a concern, but it can be allowed to schedule them for a separate system to handle. That is the only way to keep the risk small and the human in the loop meaningful.

An approve button is not a control. It is a trap.