Personal blog of Pierre-Carl Langlais on many things AI: data research, synthetic environment, LLM economics and shapes of model to come.

Synthetic Pretraining

February 8, 2026

Synthetic pretraining Pretraining data infrastructure used to be the most conservative part of a fast-moving AI world. Since GPT-3 we have been mostly scaling the usual mix of web crawls peppered with a few more select sources (including, controversially, digitized books). This is finally changing. In 2025, several major releases used extensive syn…

Read more →

Training as we know it will end

February 8, 2026

Training as we know it might end This post was originally supposed to be about synthetic data. It is finally about a vibe shift. Over the last months, there is an accumulated amount of evidence that models are changing and synthetic data is roughly at the center of it. Too many developments have suddenly challenged firmly held assumptions: the ris…

Read more →

A Realistic AI Timeline

February 8, 2026

AI timelines can grow old. Reading through the last highly publicized exercise, it's as if we were stuck in the early 2020s: ever larger models, unlocking ever more impressive emerging capacities. GPT-3 steampunk. We are already in a different cycle: pretraining as we know it is ending. Reasoning, reinforcement learning, mid and post-training are d…

Read more →

Actual LLM agents are coming

March 11, 2025

. They will be trained Agents are everywhere these days. And yet, the most consequential research development in agentic LLM research is almost unnoticed. In January 2025, OpenAI released DeepResearch, a specialized variant of O3 for web and document search. Thanks to "reinforcement learning training on these browsing tasks", Deep Research has gain…

Read more →

The Model is the Product

March 1, 2025

There were a lot of speculation over the past years about what the next cycle of AI development could be. Agents? Reasoners? Actual multimodality? I think it's time to call it: the model is the product. All current factors in research and market development push in this direction. Generalist scaling is stalling. This was the whole message behind …

Read more →

What's the deal with mid-training?

January 2, 2025

Mid-training is poised to become an AI buzzword in 2025. OpenAI has had a "mid-training" division since July, whose major contributions "include GPT4-Turbo and GPT-4o".XAI is setting up one. Phi 3.5, Yi and, most recently Olmo, devoted a lot of time, resource and efforts to mid-train their latest model. What is mid-training exactly? It's not pre-tr…

Read more →