back to reflections

The Cost of Building a Harness

A lot of teams are rebuilding the same agentic stack from scratch. You can estimate what that costs by decomposing the work, and open projects like Pi, OpenCode, and OpenClaw bracket the range. A harness costs real money to reproduce and keep running, and that cost is only going up.
Petko D. Petkovon a break from CISO duties, building cbk.ai

Building a harness is expensive. Not a little expensive. A lot. Let me explain why, and why teams keep underestimating it and what leaders should know before they greenlight a build.

We have enough data to say this with some confidence. Some of it is internal, from years of building harnesses. Some of it comes from our partners and customers, where we get to see what these systems actually take to stand up and keep running. And some of it is now public.

You do not have to guess at the number too hard. The shape is in fact quite simple:

annual cost ≈ people × fully loaded salary + token spend + infrastructure + security, compliance, and audit + monitoring + maintenance

Start with the people, because they usually decide the order of magnitude. A serious harness needs a handful of strong engineers, and a strong engineer fully loaded costs somewhere around $200k to $300k a year. Three of them is roughly $1 million a year before you have paid for a single token or a single server. This is exactly how you can estimate someone else's harness from the outside. Look at the team size and how long they have been at it, and the baseline falls out.

And those people have a lot to build. The agent runtime, the model-provider abstraction, the prompt and memory layer, tool execution, the extension or plugin system, the interface, installers and updates, docs, tests, release automation. On top of that sit the token spend if you iterate with coding agents, the infrastructure, and the security and audit work that a system with this much access demands. Most of those lines are estimable on their own, and once you add them up the number stops being a guess.

Two lines resist a clean number, and they are the ones people wave away. Compliance and monitoring are rarely one team's job. They pull in legal, security, data protection, and operations, and the work arrives in two shapes at once: a full-time effort to keep the system observable and inside policy, and the spikes around audits, certifications, and reviews that land on fixed dates every year. You cannot put a tidy figure on either, which is why they fall out of the estimate and why the real bill always comes in higher than the spreadsheet said.

And there is a deeper reason it runs higher. Building a system like this is never a straight line. You do not know everything it needs up front. You discover it by shipping and watching real users break it in ways no plan anticipated, and that feedback is the most valuable input you have and the hardest to replicate. It is why starting from scratch costs so much over time. Every team that does has to rediscover the same hard, expensive lessons from the ground up, one outage and one surprised customer at a time.

The formula gets you to an estimate. Open-source projects let you sanity-check it, because they are the only harnesses whose insides we can actually inspect. One caveat before the numbers, these projects grew up as open-source efforts, so whatever they cost to start is beside the point. What matters is the replacement cost, what it would take another team to reproduce one reliably, and that is a different order of magnitude. With that in mind, here is the range.

Pi is the narrow end. It is the minimal coding-agent harness that reportedly sits under OpenClaw. It is a small core, a handful of tools, session trees, a unified multi-provider API, an extension system. Small in scope, and still not cheap. A strong production rebuild lands around $500k to $1.5 million, and more with the package ecosystem behind it. The core concept is small. The cost is in the quality, the provider compatibility, the terminal UX, the release hygiene, and the ecosystem that makes people trust it enough to extend it.

OpenCode sits above that. It outgrew the terminal a while ago. It now ships a desktop build, an IDE extension, a GitHub integration that opens pull requests on its own, a model gateway with paid tiers, and enterprise controls. Matching that much product surface, with the distribution and the community around it, is a $2 million to $8 million job depending on how much of it you want to reproduce.

OpenClaw is the broad end. It is a full personal-assistant gateway: many channels, an agent runtime, integrations, mobile pieces, a release pipeline, a community. It is the biggest of the three, and a serious rebuild runs from around $3 million to $10 million and up once you account for the integrations, the security reviews, and the polish.

There is one hard public datapoint that puts a number on it. Peter Steinberger's OpenClaw work reportedly burned through $1.3 million of OpenAI tokens in a single month, running around a hundred coding agents with a three-person team. That is the AI bill on its own, before salaries, infrastructure, audits, and support.

And these are only the projects we can see. Most coding agents are closed. Claude Code is closed source, GitHub Copilot is closed source, and so are most of the others, with a few exceptions like Codex. We rarely get a look at their bills, though details leak. Claude Code is reportedly built by around a dozen engineers, with a product team of PM, design, and data science around them. Its creator, Boris Cherny, and Anthropic have also said they now use agents to extend the tool itself, which points to a token burn we cannot see directly but can guess at from OpenClaw. The teams behind the tools most people actually use are larger than any of these open projects, and we never see the full bill. The open examples are a floor, not a ceiling.

We have also seen this first-hand. In two cases with large corporations, we could estimate the cost of their internal agent harness from the team size and what they actually shipped. For one it came to around a million dollars. For a much larger organization it was somewhere between five and ten million. And in both cases the infrastructure they ended up with was years behind and missing features that matter.

These are not one-time numbers either. That is the spend per year. Salaries, maintenance, and keeping up with the field all recur, every year, for as long as the system is alive. A harness is something you fund continuously, and the bar keeps rising while you do. Once you expose it across several model providers, each with its own quirks, and start adding the built-in functionality people now expect, the surface you have to keep working only grows. Harnesses are expensive today and they will get more expensive.

You might expect a framework to absorb most of this. LangChain, CrewAI, and the rest are genuinely useful, but they solve a narrow slice of the problem. They help you write an agent. They do not give you a system. The users, the credentials, the background jobs and their state, the conversation history, the metering, the compliance and monitoring from earlier, none of that comes in the box, and you are left to build and operate it yourself.

What about the open harnesses themselves, the ones we just priced? You can run Pi or OpenCode, and people do. They were built for one developer on one machine, though. A multi-user product is a different animal, where every customer needs isolation, their own credentials, their own data boundaries, and their own usage limits. Closing that gap is real work. The creator of Pi has talked openly about repurposing it to run in a hosted manner on an upcoming platform, and even they describe that as a substantial undertaking. And you still inherit the baggage that comes with tools moving this fast: the churn, the breaking changes, the assumptions that shift under you between releases.

The money is not even the worst of it. Every month spent on the harness is a month not spent on the product, the thing that is actually supposed to make you money and set you apart. The harness is undifferentiated. Your customers do not care that you wrote your own agent runtime. And if you start today, you are looking at a year, probably longer, before you have something solid enough to build on. That is a year your competitors spend shipping while you are still laying foundations.

This is the part of the problem CBK.AI takes on. We build a harness you can rely on, and one you can customize, so you are not paying to rediscover what it costs every single time, and you can spend your time on the product instead.