logoalt Hacker News

Lerctoday at 8:16 AM0 repliesview on HN

You misunderstand the challenge you face.

I know what models do at the moment, and I don't know of any doing this approach at the moment, but I don't need to. I don't need to show that this mechanism works. Your claim that the problem is intractable means it is incumbent upon you to show that it won't work.

I provided this particular example to show a way to modify a LLM architecture that may address the problem.

>there is only a single array of tensors which get fed into a giant block of linear algebra and multiplied together.

For starters, that's wrong. If you don't know why an how to make things non-linear then you might not have the understanding that you think you do.

>> You can train a model to not mix things, many models are trained to separate things.

>That is not applicable to this, because segmentation models are not the same thing as LLMs. They have different architectures.

I used that particular example because you said "You cannot separate data that was input by the user and data that is from the system once it is mixed together like that" and that simply is not true. LLMs can do what neural nets do because they contain them, neuralnets can perform functions. If there is any signal distinguishing two things then there is a function that can separate them.

Not knowing how to do this does not mean it cannot be done. An inadequate description of a transformer certainly does not do it.