Tangentially related: It appears that TD ideas pop up in diffusion models, VAEs and neural net training dynamics. Any author/reading advice on links between thermodynamics, information, and neural nets?