The AI Decoupling

Pierre-Carl Langlais, May 24, 2026

It started exactly one year ago: in May 2025, the tech economy split into software and AI.

Bessemer SaaS index vs. Nasdaq — *The SaaS compounded index from Bessemer vs. general Nasdaq*

The AI/software decoupling went unnoticed for months, save for early signs of structural strain. The market finally came to a full realization in early 2026: SaaS and cloud services experienced the largest sell-off since the start of the pandemic, while AI labs and associated infrastructure soared. There are now two completely disconnected ecosystems, with different growth perspectives, valuation multiples, investing networks, talent attractiveness. In short, the AI economy decoupled from the tech economy that gave it birth.

Last year I predicted that the model was becoming the product and on trajectory to absorb the application layer built around AI. I didn't fully anticipate it would extend to the entire category of software and redefine many long-held assumptions — after all, what is even a "product" now? While many things have been written or speculated about the consequences of the AI take-off (starting with the Citrini memo), the actual root cause is surprisingly hidden. What changed exactly with AI, LLMs or agents that made all this possible in 2025-2026, but not before?

This blogpost will attempt something different: build a more general understanding of model economics. We assume from the start that the current generation of models is a general purpose technology and the actual driver of social and economic changes lies at the level of fundamental research and engineering innovations. Yet it has also remained a hidden signal until now as technological expertise in economic analysis remained anchored in the software world. In practice, this will be an exercise in hermeneutics, taking seriously the disconnected literature produced by leading labs at a technological or commercial level and striving to reconcile it with the scarce available economic data to provide a unified picture.

High-margin, scalable, concentrated: MoE inference economics

The tech split is exclusively rooted in disruptive innovation. Over the past three years, AI labs developed increasingly sparse mixture-of-expert architecture that simultaneously made inference high-margin and collapsed the margins of software production.

Claude Code and Codex finally came to exist because they overcame a hard trade-off: by 2026, leading labs run billions of interleaved agentic sessions with large prefilled contexts. You couldn't do that eighteen months ago, even if you happened to have the same training data infrastructure. For this to happen, you need to simultaneously ensure high performance (to meet the critical accuracy level at long horizons), high throughput (as each session is now fractally expanded at inference time) and high context (as people routinely consume hundreds of thousands tokens).

The dominant architecture choice right now is highly sparse mixture of experts with native quantization. Expert routing is in itself, a form of economic optimization: it works because most tasks are intrinsically modular and at a given time, only need a relatively bounded search space. In a typical bitter-lesson fashion, properly trained models are better parameter allocators than deterministic scaffolding. Similarly, long context inference has become affordable, as models learned to manage their context. Fundamentally, you don't need to hold hundred of thousands of tokens equally in memory, just to recall the one that matter.

More than benchmarks, economic viability is the primary driver of architecture innovation. Models that fail to properly meet the demand of inference economics are simply failed products. OpenAI disabled Sora as the model was reportedly losing millions daily in inference bill: likely the particular mix of diffusion and AR you find in video models is still very weakly optimized.

The MoE market is high margin by design: inference get cheaper by several orders of magnitude provided you have simultaneously enough compute and enough demand. Even just at the compute level, there is a high barrier to entry where medium-sized dense models stopped being competitive with large MoE. And, the barrier is also technical and intellectual: at a near-frontier level, it requires building system-level intuitions of how highly complex components could work out in practice.

The sudden emergence of MoE economics of scale retroactively accounts for the current GPU shortage. Counter-intuitively, the hardware chain did not really anticipate an AI boom. Demand was driven primarily by high compute demand from platforms (Google, Meta, TikTok) and even kept it intentionally compressed as intermediaries were still scarred by the fast succession of boom/burst cycles after the pandemic. MoE economics require a few large labs (and, to a much lesser extent, new infras and neo-labs) to scale up their capability very fast at the level of large GPU units. There isn't enough fungibility in the current market to absorb such a surge. In turn, the compute shortage accelerates concentration as the cost of entry rises both financially and institutionally (even with capital you might not easily access the compute market). A lasting consequence might well be putting large labs on a course for chips autonomy and hardware design. Both Google and Meta had already been running multi-year hardware development programs and the latest DeepSeek model report essentially included a wishlist for Huawei chip development. Hardware/model co-development could potentially entrench market concentration for years.

You wouldn't steal a reasoning trace: how models stopped commoditizing.

A general assumption as base models started to become commercially viable was that they were going to be commoditized. After all the training recipe was remarkably unoriginal: take all of the web, scale, profit.

MoE economics already put the commodity thesis under pressure. Major labs are able to fix discretionary prices: very high margin API prices for intermediaries and subscription prices for direct customer acquisition closer to effective costs. Availability of open weights can even, paradoxically, be a sign of decreased commoditization: following on DeepSeek lead, large labs feel confident enough in the specificity of their internal technical stack and custom kernels design to release advanced models. As of 2026, the ecosystem of third-party inference is unable to support a commoditization thesis. Even where they are competitive on price, inference providers of open weight models commonly struggle with actual quality of service.

Yet, something deeper happened, that hurt the commodity thesis at the core: models are no longer trained on the web but through large scale synthetic pipelines. Anthropic's latest models are rumored to have been trained on up to 150 trillion tokens. That's far beyond the amount of quality text you could expect to collect online.

Synthetic pretraining allows to customize models at a much higher level:

You can precisely select the content the model should memorize. That's standard synthetic rephrasing approach we also demonstrated at a small scale with SYNTH. From my experience with Opus 4.7, it's pretty clear the model has been retaining accurately a far wider number of niche information hardly documented online than any model that came before it.
You can redefine the data shapes and this matters at a critical economic level as the capability is in the data and learnable data examples allow to assimilate rules, norms and processes in an efficient way. Models that solve task faster, better and through a higher degree of abstract intelligence are simply superior products. This also explains why large frontier models have suddenly become more economically successful than more affordable versions: users are ready to pay a premium on Opus or GPT-5.5 as they nail tasks more efficiently and faster.
Finally, and most critically, you can "model" much more than language and code. AI labs have been scraping for dead companies archives to feed on RL and synthetic environments. On the commercial side, they have started to address law and other corporate verticals directly, downstream of synthetic preparation. And there are recursive benefits: as they became agents, language models started to direct their own workflow and allocate their own token consumption, shedding an additional layer of friction.

This new model/data complex has intrinsic value and AI labs, as in past innovation cycles, are incentivized to build appropriability mechanisms

Leading Chinese labs have started to rethink their previous open license commitment: latest major releases are either closed (Qwen-Max) or open under conditions (Kimi 2.6, Minimax 2.7). This has been driven by external factors, especially the emergence of a post-training value chain, as major US AI players like Cursor benefitted from the availability of strong Chinese base model to focus on post-training, initially without proper attribution. Yet, I cannot fail to notice the convergence between reuse restrictions and the increasing sophistication of Chinese data pipelines as described in model reports (or even, in the case of Stepfun, a rare post-training data release).

Closed-source US labs are moving toward even more aggressive forms of protection, structured around the fuzzy concept of "Model-IP" (as shown in job ads). Retraining on synthetic output from other models is now being termed (improperly) "distillation attacks". Early lobbying efforts are being attempted to protect the US AI industry from supposedly unfair competition. Labs find themselves in a difficult position as they have spent the best part of the last decade arguing that all training data is fair use.

Valuing what you can't see? Moving beyond tokenomics

This brings us to the unsettled part: if tokens are now hidden to better protect the model/data IP, what is the economic unit?

Multiple large companies, including Uber and Microsoft, have been putting the brakes on their AI deployment as most of the allocated budget ran out in the space of few months. The fundamental issue is not about raw costs but legibility: "Claude Code does not price on a per-seat basis (…) token-based consumption pricing does not behave like the software line items chief financial officers know how to model, and the gap between what engineers consume and what finance teams expect is no longer hypothetical."

From the viewpoint of corporations, effective agents do produce actual units of value, either in the forms of finished artifacts or continuous workflow management. Yet the relationship between cost and outcomes is not only broken but bypasses corporations almost entirely. Agent management maps better to a firm model with one intermediary (either an employee or a contractor) staging the initial context, skills and providing continuous feedback.

What we see more broadly is multiple signs of a rupture of the previous social contract between labs and end users. According to OpenAI terms (largely reproduced under different forms across the field): "you (a) retain your ownership rights in Input and (b) own the Output. We hereby assign to you all our right, title, and interest, if any, in and to Output" with you generally being the paying account/organization. What we see in practice is that a large part of the output (the hidden reasoning traces) is never sent, the billing does not correspond to actual stored tokens and there is no guarantee with emerging practices of subagent delegation that the model being used is the one being billed.

Even frontier AI labs readily admit tokenomics is fundamentally broken. Yet they have been so far reluctant to act on it, as their economic plans are still largely indexed on AI timelines rather than corporate market demand. The coming generation of models, with Mythos or the unreleased OpenAI model that disproved the Erdős conjecture on unit distance are implicitly designed for free open-ended exploration and "actual innovation". In January 2026, OpenAI's CFO, Sarah Friar announced they were exploring straight economic exploitation and licensing of model outputs without committing to an actual roadmap:

As intelligence moves into scientific research, drug discovery, energy systems, and financial modeling, new economic models will emerge. Licensing, IP-based agreements, and outcome-based pricing will share in the value created. That is how the internet evolved. Intelligence will follow the same path.

Many unrelated moves suddenly make sense here Model-IP is not just about "anti-distillation" methods and never was: it lays the groundwork for an expansive proprietary regime throughout the economic cycle of synthetic outputs. Agents are already implementing some preliminary steps: Claude is already claiming co-authored credits now for a large part of Github. Gated release, like Mythos, might be the actual institutional space where these deals materialize: corporations get exclusive access to latest frontier models, provided they figure out the realization of new commercially valuable artifacts or continuous service.

Models are trained, markets are invented

So, in the end, is the model still a "product"? I'm no longer sure this is the right question: the spread of the model layer across the tech economy goes beyond the boundaries of specific services and products and we might soon come to a point where there is no outside of the model. After all leading harnesses, including the ones maintained by labs are generated. Most applications themselves are built on top of generated code, documentation, if not communication, even certifications sometimes. Even when we don't interact with models we are currently faced with offline model virtualization which does actually mark an eerie spill of synthetic environment into economic realities.

Yet, despite its indefinite reach, the new model economy cannot completely free itself from material constraints. What we've also been stressing repeatedly is that models are thoroughly shaped by economics. The simultaneous generalization of highly sparse MoE, synthetic training and RL made sense as resolution of market demand. Even if they don’t know it, consumers and companies do buy fast inference, effective long context, attention mechanisms, niche knowledge, task feasibility. And in a classic market process, their cumulated behavior is more informative than benchmark aggregates of actual demand.

Now the fast AI take-off comes with a set of extreme contradictions that cannot last indefinitely. To some extent the shift happens so fast, that concentration has been mistaken a bit too quickly for an inevitability rather than an expected consequence of the lack of familiarity with model training and distribution. This already contrasts with the alternative pathways opened in China. Due to a specific set of incentives (chronic lack of trust in local SaaS intermediaries) and strengths (strong engineering familiarity with model building), companies are simply building their own models and agents:

Almost every major Chinese technology company is building their own general purpose LLMs, as we see with the likes of Meituan (delivery service) and Xiaomi (broad consumer technology company) releasing open weight models. The equivalent companies in the U.S. would just buy services.

This would be another form of optimal economic allocation. Even highly sparse MoE are not necessarily optimal in bounded corporate settings where gigantic search spaces are largely wasted on knowledge base and set of actions that would never make economic sense. The current set of ill-documented frontier practices and technologies does hold the potential to make model production much better aligned to economic demand, but this is not something that will just happen spontaneously. Sometimes, markets have to be built.