The hardest part about making a new architecture is that even if it is just better than transformers...

alyxya • last Sunday at 4:34 PM • 5 replies • view on HN

The hardest part about making a new architecture is that even if it is just better than transformers in every way, it’s very difficult to both prove a significant improvement at scale and gain traction. Until google puts in a lot of resources into training a scaled up version of this architecture, I believe there’s plenty of low hanging fruit with improving existing architectures such that it’ll always take the back seat.

Replies

p1esk • last Sunday at 7:36 PM

Until google puts in a lot of resources into training a scaled up version of this architecture

If Google is not willing to scale it up, then why would anyone else?

➕ show 1 reply

tyre • last Sunday at 8:11 PM

Google is large enough, well-funded enough, and the opportunity is great enough to run experiments.

You don't necessarily have to prove it out on large foundation models first. Can it beat out a 32b parameter model, for example?

➕ show 1 reply

nickpsecurity • last Sunday at 10:52 PM

But, it's companies like Google that made tools like Jax and TPU's saying we can throw together models with cheap, easy scaling. Their paper's math is probably harder to put together than an alpha-level prototype which they need anyway.

So, I think they could default on doing it for small demonstrators.

m101 • last Monday at 12:15 AM

Prove it beats models of different architectures trained under identical limited resources?

UltraSane • last Sunday at 5:11 PM

Yes. The path dependence for current attention based LLMs is enormous.

➕ show 1 reply

alt Hacker News

Replies