Optimizations, like I said. They'll never hack away the massive memory requirements however, or...

bcjdjsndon • yesterday at 9:13 PM • 1 reply • view on HN

Optimizations, like I said. They'll never hack away the massive memory requirements however, or the pre training... Imagine the memory requirements without the pre training step....this is just part and parcel of the transformer architecture.

Replies

bcjdjsndon • yesterday at 9:15 PM

And a lot of these improvements are really just classic automation or chaining together yet more transformer architectures, to fix issues the transformer architecture creates in the first place (hallucinations, limited context)

alt Hacker News

Replies