There were a lot of speculation over the past years about what the next cycle of AI development could be. Agents? Reasoners? Actual multimodality?
I think it's time to call it: the model is the product.
All current factors in research and market development push in this direction.
This is also an uncomfortable direction. All investors have been betting on the application layer. In the next stage of AI evolution, the application layer is likely to be the first to be automated and disrupted.
Over the past weeks, we have seen two prime example of this new generation of models as a product: OpenAI's DeepResearch and Claude Sonnet 3.7.
I've read a lot of misunderstandings about DeepResearch, which isn't helped by the multiplication of open and closed clones. OpenAI has not built a wrapper on top of O3. They have trained an entirely new model, able to perform search internally, without any external calls, prompts or orchestration:
The model learned the core browsing capabilities (searching, clicking, scrolling, interpreting files) (…) and how to reason to synthetize a large number of websites to find specific pieces of information or write comprehensive reports through reinforcement learning training on these browsing tasks.
DeepResearch is not a standard LLM, nor a standard chatbot. It's a new form of research language model, explicitly designed to perform search tasks end to end. The difference is immediately striking to everyone using it seriously: the model generate lengthy reports with consistent structure and underlying source analysis process. In comparison as Hanchung Lee underlined all the other DeepSearch, including the Perplexity and Google variant, are just your usual models with a few twists:
Google’s Gemini and Perplexity’s chat assistants also offer “Deep Research” features, but neither has published any literature on how they optimized their models or systems for the task or any substaintial quantitative evaluations (…) We will make an assumption that the fine-tuning work done is non-substantial.
Anthropic has been laying their current vision ever clearer. In December, they introduced a controversial but, to my mind, correct definition of agent models. Similarly to DeepSearch, an agent has to perform the targeted tasks internally: they "dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks".
What most agent startups are currently building is not agents, it's workflows, that is "systems where LLMs and tools are orchestrated through predefined code paths." Workflows may still bring some value, especially for vertical adaptations. Yet, to anyone currently working in the big labs it's strikingly obvious that all major progress in autonomous systems will be through redesigning the models in the first place.
We had a very concrete demonstration of this with the release of Claude 3.7, a model primarily trained with complex code use cases in mind. All the workflow adaptation like Devin had a major boost on SWE benchmarks.
To give it another example done at a much smaller scale: at Pleias we're currently working on automating RAG. Current RAG systems are a lot of interconnected yet brittle workflows: routing, chunking reranking, query interpretation, query expansion, source contextualization, search engineering. With the evolving training tech stack, there is a real potential to bundle all theses processes in two separate yet interconnected models, one for data preparation and the other for search/retrieval/report generation. This requires an elaborated synthetic pipeline and entirely new reward functions for reinforcement learning. Actual training, actual research.
What all this all means in practice: displacing complexity. Training anticipates a wide range of actions and edge cases, so that deployment becomes much more simple. But in this process most of the value is now created and, likely in the end, captured by the model trainer. In short, what Claude aims to disrupt and replace the current workflows like this basic "agent" system from llama index:
With this:
To reassert: the big labs are not advancing with an hidden agenda. While they can be opaque at time, they laying it all in the open: they will bundle, they will go up the application layer and they will attempt to capture most of the value there. And the commercial consequences are quite clear. Naveen Rao, the Gen AI VP of Databricks, phrased it quite well:
all closed AI model providers will stop selling APIs in the next 2-3 years. Only open models will be available via APIs (…) Closed model providers are trying to build non-commodity capabilities and they need great UIs to deliver those. It's not just a model anymore, but an app with a UI for a purpose.
So what is happening right now is just a lot of denial. The honeymoon period between model providers and wrappers is over. Things could evolve in both direction:
In short the dilemma for most successful wrappers is simple: training or being trained on. What they are doing right now is both free market research for the big labs but, even, as all outputs is ultimately generated through model providers, free data design and generation.
What will happen afterwards is anyone guess. Successful wrappers do have the advantage of knowing their vertical well and accumiulating a lot of precious user feedbacks. Yet, in my experience, it's easier to go down from the model to application layers than building an entirely new training capacities from scratch. Wrappers may not have been helped by their investors either. From what I overheard, there is such a negative polarization against training, they almost have to hide what is going to be their most critical value: neither cursor small nor codium are properly documented at this moment.
This brings me to the actual painful part: currently all AI investments are correlated. Funds are operating under the following assumptions:
I'm afraid this increasingly look like an adventurous bet and an actual market failure to accurately price the latest technical developments, especially in RL. In the current economic ecosystem, venture funds are meant to find uncorrelated investments. They will not beat S&P500 but that's not what larger institutional investors are looking for: they want to bundle risks, ensure that in a bad year at least some things will work out. Model training is like a textbook perfect example for this: lots of potential for disruption in a context where most western economies are on course for a recession. And yet model trainers can't raise, or at least not in the usual way. Prime Intellect is one of the few new western ai training companies that has a clear potential to become a frontier lab. Yet, despite their achievements including the training of the first decentralized LLM, they struggled to raise more than your usual wrapper.
Beyond that, aside from the big lab, the current training ecosystem is very tiny. You can count all theses companies on your hands: Prime Intellect, Moondream, Arcee, Nous, Pleias, Jina, the HuggingFace pretraining team (actually tiny)… Along with a few more academic actors (Allen AI, Eleuther…) they build and support most of the current open infrastructure for training. In Europe, I know that at least 7-8 LLM projects will integrate the Common Corpus and some of the pretraining tools we developed at Pleias — and the rest will be fineweb, and likely post-training instruction sets from Nous or Arcee.
There is something deeply wrong in the current funding environment. Even OpenAI senses it now. Lately, there was some felt irritation at the lack of "vertical RL" in the current Silicon Valley startup landscape. I believe the message comes straight from Sam Altman and will likely result in some adjustment in the next YC batch but pinpoint to a larger shift: soon the big labs select partners won't be API customers but associated contractors involved in the earlier training stage.
If the model is the product, you cannot necessarily build it alone. Search and code are easy low hanging fruits: major use cases for two years, the market is nearly mature and you can ship a new cursor in a few months. Now many of the most lucrative AI uses cases in the future are not at this advanced stage of development — typically, think about all these rule based system that still rule most of the world economy… Small dedicated teams with a cross-expertise and a high level of focus may be best positioned to tackle this— eventually becoming potential acquihire once the initial ground work is done. We could see the same pipeline in the UI side. Some preferred partner, getting exclusive API access to close specialized models, provided they get on the road for business acquisition.
I haven't mentioned DeepSeek, nor Chinese labs so far. Simply because DeepSeek is already one step further: not model as a product, but as a universal infrastructure layer. Like OpenAI and Anthropic, Lian Wenfeng lays his plans in the open:
We believe that the current stage is an explosion of technological innovation, not an explosion of applications (…) If a complete upstream and downstream industrial ecosystem is formed, then there is no need for us to make applications ourselves. Of course, there is no obstacle for us to make applications if needed, but research and technological innovation will always be our first priority.
At this stage, working only on applications is like "fighting the next wars with last war generals". I'm afraid we're at the point where many in the west are not even aware the last war is over.