> It seems to me that in 2016 people did (have to) play a lot more tricks with the backpropagatio...

HarHarVeryFunny • yesterday at 6:17 PM • 1 reply • view on HN

> It seems to me that in 2016 people did (have to) play a lot more tricks with the backpropagation than today

Perhaps, but maybe because there was more experimentation with different neural net architectures and nodes/layers back then?

Nowadays the training problems are better understood, clipping is supported by the frameworks, and it's easy to find training examples online with clipping enabled.

The problem itself didn't actually go away. ReLU (or GELU) is still the default activation for most networks, and training an LLM is apparently something of a black art. Hugging Face just released their "Smol Training Playbook: a distillation of hard earned knowledge to share exactly what it takes to train SOTA LLMs", so evidentially even in 2025 training isn't exactly a turn-key affair.

Replies

joe_the_user • today at 1:01 AM

I wonder if its just about the important neural nets now being trained by large, secretive corporations that aren't interested in sharing their knowledge.

alt Hacker News

Replies