logoalt Hacker News

bicepjailast Wednesday at 6:28 AM1 replyview on HN

Hooker’s argument lands for me because it ties the technical scaling story to institutional incentives: as progress depends more on massive training runs, it becomes capital-intensive, less reproducible and more secretive; so you get a compute divide and less publication.

I’m trying to turn that into something testable with a simple constraint: “one hobbyist GPU, one day.” If meaningful progress is still possible under tight constraints, it supports the idea that we should invest more in efficiency/architecture/data work, not just bigger runs.

My favorite line >> Somewhat humorously, the acceptance that there are emergent properties which appear out of nowhere is another way of saying our scaling laws don’t actually equip us to know what is coming.

Regarding this paragraph >> 3.3 New algorithmic techniques compensate for compute. Progress over the last few years has been as much due to algorithmic improvements as it has been due to compute. This includes extending pre-training with instruction finetuning to teach models instruction following ..., model distillation using synthetic data from larger more performant "teachers" to train highly capable, smaller "students" ..., chain-of-thought reasoning ..., increased context-length ..., retrieval augmented generation ... and preference training to align models with human feedback ...

I would consider algorithmic improvements to be the following 1. architecture like ROPE, MLA 2. efficiency using custom kernels

The errors in the paper 1. Transformers for language modeling (Vaswani et al., 2023). => this shd be 2017

Disclosure: my proposed experiments: https://ohgodmodels.xyz/


Replies

bigbadfelinelast Wednesday at 9:34 PM

> as progress depends more on massive training runs, it becomes capital-intensive, less reproducible and more secretive; so you get a compute divide and less publication.

In the area of AI, secrecy and inability to reproduce/verify can become a huge systemic and social problem, the possible damage is literally unbounded.

That's why I like open source AI, including training data and process, it solves the above problem as well as the problem of duplication of effort which leads to a huge waste of resources, waste that is economically significant on national and global scales.