yeah, we do incorporate some of the findings from the paper in our repo! like aggressive regularization and ensembling.
I see you already mention diffusion - iirc there was a result not too long ago that diffusion models keep improving with more epochs for longer than AR models do.
I see you already mention diffusion - iirc there was a result not too long ago that diffusion models keep improving with more epochs for longer than AR models do.