back to reflections

Build Your Harness for the Weakest Model

A harness tuned against a frontier model breaks the moment you swap in a weaker one. Build it against weak open-weight models and it only gets stronger when a capable model sits behind it. We made that switch internally to find the bugs first.
Petko D. Petkovon a break from CISO duties, building cbk.ai

A harness is all the scaffolding around a model that keeps an agent on track. Retries, validation, guardrails, the prompts that nudge it back when it drifts. How much of it you need depends entirely on the model underneath.

Develop your harness against a frontier model and it will feel light, because the model carries you. It rarely loses the plot, so you never write the code that catches it when it does. Point that same harness at a weaker model and it falls apart. The model wanders, the harness has nothing to catch it with, and the whole thing comes undone.

Go the other way and you end up somewhere much better. A harness built against a weak model has to deal with hallucination, inconsistency, garbage output, and the occasional infinite loop, because the model throws all of that at you daily. Put a frontier model behind that same harness later and it runs cleaner than ever, with every safety net still there for the rare moment the strong model slips.

We made this switch internally. Our agents now run on Kimi K, GLM, and a mix of other low-powered open-weight models. This has nothing to do with the cost difference. We did it to surface bugs before our customers ever could.

And we found them. Weaker models hallucinate more, need more guardrails to stay consistent, drift into garbage, and get stuck repeating themselves. Every one of those was a hole in our harness that a frontier model had been quietly papering over.

If you want to see this in the wild, watch Gemini in Google AI Studio spiral into deranged, looping text. What you are looking at is a harness that cannot catch the model when it goes off the rails. Even Google, running its own models, has not fully closed that gap.

So build for the weakest model you can stand to work with. The harness you are left with is the one that holds when it matters.