logoalt Hacker News

soupspacestoday at 3:33 AM0 repliesview on HN

Universal approximation theorem, embeddings, self-attention, gradient descent. And empirically, scaling laws.