If you really want to see fully open training pipelines for modern LLMs, Olmo and to a lesser extent Nemotron are what you should look at.
I'm not really familiar with either, but I'm more familiar with Olmo. My impression is Nemotron is newer -- why is it less applicable? Is it not totally open like Olmo?
After my own very exhaustive survey, I can just say '+1' and also good to note that OLMo has actually had one independent reproduction (albeit not open) done: https://www.amd.com/en/developer/resources/technical-article...
I often wonder why OLMo and Nemotron aren't more popular -- they are gold-standard / "frontier" of a year ago. If we had more support behind these, seeing a true open-source AI system that legitimately challenges OpenAI & Anthropic might not be far away!