Insights / Greymatter

Code Smarter, Not Harder

Corinne Riley · May 27, 2025 · Copy link

1. Enhancing Existing Workflows

Today, the vast majority of AI code startups are taking the shape of in-IDE co-pilots or chat interfaces to enhance engineering workflows. While companies like Tabnine have been working on code assistants for many years, the big moment for coding AI tools came with the release of GitHub Copilot in 2021. Since then, we have seen a flurry of startups going after the various jobs to be done by engineers.

Startups finding traction are going after workflows centered around code generation or code testing. This is because:

  • They are core parts of an engineer’s job
  • They can require relatively low context to be sufficiently useful
  • In most cases, they can be bundled within a single platform
  • In a world where reliability is scarce, putting outputs right in front of the user (i.e. in the IDE) allows them to take ownership of any corrections required

The elephant in the room is the challenge of going after GitHub Copilot, which already has considerable distribution and mindshare (congratulations to Devin who just secured their own partnership with Microsoft). Startups are working around this by looking for pockets of differentiation in which to ground their wedge. For example, Codeium is taking an enterprise-first approach, while Codium is starting with code testing and reviewing and expanding from there.

We also believe there is a strong opportunity for tools going after tasks like code refactoring, code review, and software architecting. These can be more complex as they not only require a larger surface area of understanding within the code, but also an understanding of a graph of knowledge between different files, knowledge of external libraries, understanding of the business context, the end-usage patterns of the software, and the complex selection of tools.

Regardless of the wedge, one of the recurring challenges we’re seeing at this layer is how to access relevant context to solve wider-reaching tasks within a company’s codebase. Exactly how that’s done is an open question, which we explore in the last section in this post.

2. AI Coding Agents

If augmenting engineering workflow is valuable, an even larger opportunity is figuring out what workflows can be completely replaced.

AI coding products that can perform engineering tasks end-to-end – and can work in the background while the human engineer does something else – would create an entirely new level of productivity and innovation. A giant leap beyond AI co-pilots, AI coding agents could take us from a realm of selling tooling to selling labor. In a world where coding agents get very good, you could have a single human supervising multiple “AI engineers” in parallel.

The fundamental capability of an AI agent isn’t just about predicting the next word in a line of code. It needs to couple that with the ability to carry out complex tasks with upwards of dozens of steps, and, like an engineer, think about the product from the user perspective. For example, if prompted to fix a bug, it needs to know its location, the nature of its problem, how it affects the product, any downstream changes that might result from fixing the bug, and much more before it can even take the first action. The context must come from something like ingesting Jira tickets, larger chunks of the codebase, and other sources of information. Being able to write detailed code specs and accurate code planning will become central to adopting AI engineers.

Companies and projects we have seen in this space include (but are not limited to) Devin, Factory, CodeGen, SWE-Agent, OpenDevin, AutoCodeRover, Trunk, and more.

The question then is: what needs to be done for agents to be able to complete a larger portion of tasks end to end? This is addressed in my open questions section.

3. Code-Specific Foundation Model Companies

A few founders believe that in order to build long-term differentiation at the code app layer, you need to own a code-specific model that powers it.

It’s not an unreasonable suggestion, but it seems there are a few open questions that have steered other startups away from this capital-intensive approach – primarily that it’s unclear whether a code-specific model will get leapfrogged by improvements at the base model layer. I’ll go into this topic further in the open questions section.

First, let’s recall that most foundational LLMs are not trained exclusively on code, and many existing code-specific models like CodeLlama and AlphaCode are created by taking an LLM base model, giving it millions of points of publicly available code, and fine-tuning it to programming needs.