If the bitter lesson were true we'd be getting sota results out of two layer neural networks us...

llm_trw • 02/20/2025 • 1 reply • view on HN

If the bitter lesson were true we'd be getting sota results out of two layer neural networks using tanh as activation functions.

It's a lazy blog post that should be thrown out after a minute of thought by anyone in the field.

Replies

sigmoid10 • 02/20/2025

That's not how the economics work. There has been a lot of research that showed how deeper nets are more efficient. So if you spend a ton of compute money on a model, you'll want the best output - even though you could just as well build something shallow that may well be state of the art for its depth, but can't hold up with the competition on real tasks.

➕ show 1 reply

alt Hacker News

Replies