Related, I heard about curriculum learning for LLMs quite often but I couldn’t find a library to order training data by an arbitrary measure like difficulty, so I made one[0].
What you get is an iterator over the dataset that samples based on how far you are in the training.