logoalt Hacker News

sumitkumartoday at 7:18 AM3 repliesview on HN

The weights start with a random manifold. The training takes data and shapes the manifold, weight by weight, in many cycles. Once the training is the done manifold is fixed.

When a new inference has to be done the query(q) is projected in the manifold space. This projection is dropped on the manifold and the gravity of the manifold gives an answer of q+1 length. Which(qw+i) is dropped qw+n times to output a final response of n length.

The gravity is created by repeated multiplication(of the weights/input) to find out how the projected embeddings should fall according to the manifold in the GPU.


Replies

akietoday at 7:33 AM

That's a very concise and illuminating way to think about what's happening, IF (and only if) you already know how these models work. Thanks for that.

show 1 reply
DougBTXtoday at 8:05 AM

The weights are code, the prompt is code, the output is code.

Is the meat code?

show 3 replies
noduermetoday at 7:30 AM

In what way is that different from any other model of reality that you'd use to winnow a dataset into an answer to a question? The only major difference I see is that beyond a certain number of transformations, people are willing to treat it as some sort of miracle, and too tired to figure out why it came up with the answer it came up with. It's almost like people desperately want to give up their agency and creativity to black boxes, whether those weights produce answers that are right or wrong. Factor in that psychology and it looks a lot less like we have invented something useful, and a lot more like we as a species are choosing to quit life en masse.

show 2 replies