It started exactly one year ago: in May 2025, the tech economy split into software and AI.
The AI/software decoupling went unnoticed for months, save for early signs of discomfort. Market finally came under a full realization in early 2026: SaaS, cloud services, corporate infra experienced the largest sell-off since the start of the pandemics, while AI labs and associated infrastructure soared. In effect, there are now two completely disconnected ecosystems, with entirely different growth perspective, valuation multiples, investing networks, recruitment strategy. In short, the AI economy decoupled from the tech economy that gave it birth.
Last year I predicted that the model was becoming product and on trajectory to absorb the application layer built around AI. I didn't fully anticipated it would extend to the entire category of software and redefine many long-held assumptions — after all, what is even a "product" now? While many things have been written or speculated about the consequences of the AI take-off (starting with the Citrini memo), the actual root cause is surprisingly hidden. What changed exactly with AI, LLMs or agents that made all this possible in 2025-2026, but not before?
This blogpost will attempt something different: build a more general understanding of model economics. We assume from the start that the current generation of models is a general purpose technology and the actual driver of social and economic changes lay at the level of fundamental research and engineering innovations. Yet it has also remained a hidden signal until now as technological expertise in economic analysis remained anchored in the software world. In practice, this will be an exercise in hermeneutics, taking seriously the disconnected literature produced by leading labs at a technological or commercial level and striving to reconcile it with the scarce available economic data to provide a unified picture.
The tech split is exclusively rooted in disruptive innovation. Over the past three years, AI labs developed increasingly sparsed mixture-of-expert architecture that simultaneously made inference high-marging and compressed the economics of software production.
Claude Code and Codex finally came to exist because they overcame hard trade-off: by 2026, leading labs run billions of interleaved agentic sessions with large prefilled contexts. You couldn't do that eighteen months ago, even if you happened to have the same training data infrastructure. For this to happen, you need to simultaneously ensure highly performance (to meet the critical accuracy level at long horizons), high throughput (as each session is now fractally expanded at inference time) and high context (as people routinely consume hundreds of thousands token).
The dominant architecture choice right now is highly sparsed mixture of experts with native quantization. Expert routing is in itself, a form of economic optimization: it works because most tasks are intrinsically modular and at a given time, only need a relatively bounded search space. In a typical bitter lesson way, properly trained models are better parameter allocators than deterministic scaffolding. Similarly, long context inference has become affordable, as models learned to manage their context. Fundamentally, you don't need to hold hundred of thousands of tokens equally in memory, just to recalls the one that matter.
More than benchmarks, economic viability is the primary driver of architecture innovation. Models that fail to properly meet the demands of inference economics are simply failed products. OpenAI disabled SORA as the model was reportedly losing millions daily in inference bill: likely the particular mix of diffusion and AR you find in video models is still very weakly optimized.
The MoE market is high margin by design: inference get cheaper by several orders of magnitude provided you have simultaneously enough compute and enough demands. Even just at the compute level, there is a high ticket of entry where medium-sized dense models stopped being competitive with large MoE. And, the entry point is also technical and intellectual: at a near-frontier level, it requires building system-level intuitions of how highly complex components could work out in practice.
Sudden emergence of MoE economics of scale retroactively accounts for the current GPU shortage. Counter-intuitively, the hardware chain did not really anticipated an AI boom. Demands was driven primarily by high compute demand from platforms (Google, Meta, TikTok) and even kept intently compressed as intermediaries were still scarred by the fast succession of cycle post-pandemics. MoE economics require a few large labs (and, to a much lesser extent, new infras and neo-labs) to scale up their capability very fast at the level of large GPU units. There isn't enough fungibility in the current market to absorb that. In turns, compute shortage accelerate concentration as the cost of entry rises both financially and institutionally (even with capital you might not easily access the compute market). A lasting consequence might well be putting large labs on a course for chips autonomy and hardware design. Both Google and Meta had already year-long hardware program developments and the latest DeepSeek model report essentially included a wishlist for Huawei chips development. Hardware/model co-development would essentially freeze market concentration for years.
A general assumption as base models started to become commercially viable was that there was going to commoditize. After all the training recipe was remarkably unorignal: take all of the web, scale, profit. By 2023, OpenAI and Anthropic had an early lead in model development but nothing that would prevent
MoE economics already put the commodity thesis under pressure. Major labs are able to fix discretionary prices: very high margin API prices for intermediaries and, subscription prices for direct customer acquisition closer to effective costs. Availability of open weights can even, paradoxically, be a sign of decreased commoditization: following on DeepSeek lead, large labs feel confident enough in the specificity of their internal technical stack and custom kernels design to release advanced models. As of 2026, the ecosystem of third-party inference is unable to support a commoditization thesis in a wide sense. Even where they are competitive on price, inference providers of open weight models commonly struggle with actual quality of service.
Yet, something deeper happened, that hurt the commodity thesis at the core: models are no longer trained on the web but through large scale synthetic pipelines. Anthropic latest models are rumored to have been trained on up to 150 trillion tokens. That's far beyond the amount of quality text you could expect to collect online.
Synthetic pretraining allows to customize models had a much higher level:
This new model/data complex has intrinsic value and AI labs and similarly to past innovation cycles incentivize to build appropriability mechanisms.
Leading Chinese labs have started to rethink their previous open license committment: latest major releases are either closed (Qwen-Max) or open under conditions (Kimi 2.6, Minimax 2.7) This has been driven by external factors, especially the emergence of a post-training value chain, as major US AI players like Cursor benefitted from the availability of strong Chinese base model to focus on post-training, initially without proper credit. Yet, I cannot fail to notice the convergence between reuse restrictions and the increasing sophistication of Chinese data pipelines as described in model reports (or even, in the case of Stepfun, a rare post-training data release).
Closed-source US labs are moving toward even more agressive forms of protection, structured around the fuzzy concept of "model IP" (as shown in job ads). Retraining on synthetic output from other models are now being termed (improperly) "distillation attacks". We start to see early lobbying push are being attempted to protect the US AI industry from supposedly unfair competition. Labs find themselves in a difficult position as they have spent the best part of the last decade arguing that all training data is fair use.
This led us to the unsettled part: if tokens are now hidden to better protect the model/data IP, what is the economic unit?
Multiple large companies, including Uber and Microsoft, have been putting break on their AI deployment as most of the allocated budget expired in the space of few months. The fundamental issue is not about raw costs but legibility: "Claude Code does not price on a per-seat basis (…) token-based consumption pricing does not behave like the software line items chief financial officers know how to model, and the gap between what engineers consume and what finance teams expect is no longer hypothetical."
From the viewpoint of corporations, effective agents do produce actual units of value, either in the forms of finished artifacts or continuous workflow management. Yet the relationship between cost and outcomes is not only broken but shortcut corporations almost entirely. Agent management maps better to [a firm model]()https://hypersoren.xyz/posts/against-coasean-singularity/ with one intermediary (either an employee or a contractor) staging the initial context, skills and providing continuous feedback.
What we see more broadly is multiple signs of a rupture of the previous social contract between labs and end users. According to OpenAI terms (largely reproduced under different forms across the field): "you (a) retain your ownership rights in Input and (b) own the Output. We hereby assign to you all our right, title, and interest, if any, in and to Output" with you generally being the paying account/organization. What we see in practice is that a large part of the output (the hidden reasoning traces) is never sent, the billing does not correspond to actual stored tokens and there is even no guaranteee with emerging practices of subagent delegation that the model being used is the one being billed.
Even frontier AI labs readily admit tokenomics is fundamentally broken. Yet they have been so far reluctant to act on it, as their economic plans are still largely indexed on AI timelines rather than corporate market demand. The coming generation of models, with Mythos or the unreleased OpenAI model that disproved the Erdös conjecture are tacitly designed for free open-ended exploration and "actual innovation". in January 2026, OpenAI's CFO, Sarah Friar announced they were exploring straight economic exploitation and licensing of model outputs without committing to an actual roadmap:
As intelligence moves into scientific research, drug discovery, energy systems, and financial modeling, new economic models will emerge. Licensing, IP-based agreements, and outcome-based pricing will share in the value created. That is how the internet evolved. Intelligence will follow the same path.
Many unrelated moves make suddenly sense here. Model-IP is not just about "anti-distillation" methods and never was: it lays the groundwork for an expansive proprietary scheme extenting throughout the economic cycle of synthetic outputs. Agents are already implementing some preliminary steps: Claude is already claiming co-authored credits now for a large part of Github. Gated release, like Mythos, might be the actual institutional space where these deal materialize: corporations get exclusive access to latest frontier models, provided they figure out the realization of new commercially valuable artifacts or continuous service.
So, in the end, is the model still a "product"? I'm no longer sure this is the right question: the spread of the model layer across the tech economy goes beyond the boundaries of specific service and products and we might soon come to a point where there is no outside of the model. After all leading harnesses, including the ones maintained by labs are generated. Most applications themselves are built on top of generated code, documentation, if not communication, even certifications sometimes. Even when we don't interact with models we are currently faced with offline model virtualization which does actually mark an eerily spill of synthetic environment into economic realities.
Yet, despite its indefinite reach, the new model economy cannot completely absolved itself from material constraints. What we've also been stressing repeatedly is that models are thoroughly shaped by economics. The simultaneous generalization of highly sparse MoE, synthetic training and RL made sense as resolution of market demand. Even if they don’t know it, consumers and companies do buy fast inference, effective long context, attention mechanism, niche knowledge, task feasbility. And in a classic market process, their cumulated behavior is more informative than benchmark aggregates of actual demand.
Now the fast AI take-off comes with a set of extreme contradictions that cannot last indefinitely. To some extent the shift happens so fast, that concentration has been mistaken a bit too fast for an inevitability rather than an expected consequence of the lack of familiarity with model training and distribution. This already contrasts with the alternative pathways opened in China. Due to specific set of incentives (chronical lack of trust in local SaaS intermediaries) and strengths (strong engineering familiarity with model building), companies are simply building their own models and agents:
Almost every major Chinese technology company is building their own general purpose LLMs, as we see with the likes of Meituan (delivery service) and Xiaomi (broad consumer technology company) releasing open weight models. The equivalent companies in the U.S. would just buy services.
This would be another form of optimal economic allocation. Even highly sparsed MoE are not necessarily optimal in bounded corporate settings where gigantic search spaces are largely wasted on knowledge base and set of actions that would never make economic sense. The current set of ill-documented frontier practices and technologies does hold the potential to make model production much better aligned to economic demand, but this is not something that will just happen spontaneously. Sometimes, markets have to be built.