logoalt Hacker News

llm_trwlast Thursday at 8:20 AM1 replyview on HN

If the bitter lesson were true we'd be getting sota results out of two layer neural networks using tanh as activation functions.

It's a lazy blog post that should be thrown out after a minute of thought by anyone in the field.


Replies

sigmoid10last Thursday at 10:00 PM

That's not how the economics work. There has been a lot of research that showed how deeper nets are more efficient. So if you spend a ton of compute money on a model, you'll want the best output - even though you could just as well build something shallow that may well be state of the art for its depth, but can't hold up with the competition on real tasks.

show 1 reply