The Rise of Micro Coding Assistants
Coding assistants will keep getting better. Models will improve. Context windows will most likely grow. Most developers expect this trajectory and it is probably correct.
But all this sweet LLM juice is subsidized.
The API costs do not match the fixed monthly plans. A heavy user burns through far more compute than their subscription covers. We know this because companies like Cursor had to build their own models. They based them on Kimi K, fine-tuned for their use case, but still - you do not build your own model because the economics of the existing ones work. You build your own model because they do not.
This leaves three possible futures. Premium model prices come down enough to be sustainable at current subscription rates. Open source models catch up and run on commodity hardware. Or the subsidy continues indefinitely, funded by venture capital that eventually (in the year 3000) wants a return.
Open source is the practical path. It always has been. But open source models today are not going to bootstrap an entire application from a single prompt. They lack the raw capability of the frontier models for complex, multi-step execution across a large codebase.
That is fine. They do not need to.
What open source models are perfectly capable of is executing small, well-defined tasks. And this is where the economics flip. Bounded problems with clear success criteria can be handled by a smaller model reliably.
I call these micro coding assistants. An assistant that does one thing. Not a system of agents coordinating through some orchestration layer. Just one agent, one task, running autonomously on a schedule.
A micro coding assistant for code consistency - fixes comments, naming conventions, coding patterns to make the code readable by mere mortals. Another for UI accessibility checks. Another for dependency hygiene. Another for test coverage gaps in recently changed files. Each one owns its domain. Each one runs independently. None of them need to know the others exist.
The best evidence that this works is our own experience at ChatBotKit. We run many autonomous coding agents internally (for science). The most useful one is small. It focuses exclusively on code consistency and quality control - comments, naming, patterns, that kind of thing. It has a 100% acceptance rate on the pull requests it generates. Every PR it opens gets merged. It is genuinely one of the most useful members of the team and it runs fully autonomously. Why not use linters? We do! But linters do not handle coding patterns, they do not understand context, and they do not fix things. This agent does. It is a micro coding assistant that runs on an open source model and it is a net positive on our codebase every week.
Then we also have larger, more ambitious agents. Agents that attempt complex refactors, feature additions, architectural improvements. Their acceptance rate sits around 10%. Most of what they produce gets discarded. They suck and frankly they are about to be retired because the maintenance burden is not worth the value they add. They are expensive to run and expensive to maintain. They are a net negative on our codebase.
The contrast is the point. The agent that does one small thing well has perfect reliability. The agent that attempts everything has almost none. And the small agent runs on far cheaper compute.
Scale this out. Ten micro coding assistants, each running on an open source model, each handling a narrow task on a schedule. The combined impact on codebase quality is enormous. The combined cost is a fraction of a single premium model subscription. And because each agent is small and focused, the failure modes are obvious, the fixes are simple, and the trust builds fast.
The industry is obsessed with building the one agent that does everything. That agent does not exist yet and when it does it will be expensive. What exists right now, today, is the ability to build many small agents that each do one thing and do it well.
That is a better architecture.
A practical note: this is one of the things ChatBotKit was designed for. Build a focused agent, point it at a specific task, let it run. No orchestration complexity. No frontier model costs. Just small, reliable agents that compound over time.