logoalt Hacker News

AIorNotyesterday at 11:53 PM0 repliesview on HN

Its Dario's job to hype the product and he hypes the product to get the billons they need- a bit more engineering focused than Altman, but no fundamental difference.

A large language model like GPT runs in what you’d call a forward pass. You give it tokens, it pushes them through a giant neural network once, and it predicts the next token. No weights change. Just matrix multiplications and nonlinearities. So at inference time, it does not “learn” in the training sense

we need some kind of new architecture to get to next gen wow stuff e.g differentiable memory systems. ie instead of modifying weights, the model writes to a structured memory that is itself part of the computation graph. More dynamic or modular architectures not bigger scalling and spending all our money on data centers

anybody in the ML community have an answer for this? (besides better RL and RHLF and World Models)