A blog from Alexander Doria (also known sometimes as Pierre-Carl Langlais).

I train LLMs at Pleias. I write mostly about LLM research, especially in regards to training data of all kind (synthetic, open, raw, processed, distilled...), although preferably of the tasteful sort.

The domain name vintagedata.org was booked almost ten years ago, when I was mostly caring about digital humanities. I feel it's not totally irrelevant to the LLM age — after all we mostly train on past data.

Newest

What is the deal with mid-training?"