The models are differentiable, they are trained with backprop. You can easily just run it in revers...

program_whiz • today at 10:56 AM • 0 replies • view on HN

The models are differentiable, they are trained with backprop. You can easily just run it in reverse to get the input that produces near certainty of producing the output. For a given sequence length, you can create a new optimzation that takes the input sequence, passes to model (frozen) and runs steps over the input sequence to reduce the "loss" which is the desired output. This will give you the optimal sequence of that length to maximize the probability of seeing the output sequence. Of course, if you're doing this to chatGPT or another API-only model, you have no choice but to hunt around.

Of course the optimal sequence to produce the output will be a series of word vectors (of multi-hundreds of dimensions). You could match each to its closest word in any language (or make this a constraint during solving), or just use the vectors themselves as the compressed data value.

Ultimately, NNets of various kinds are used for compression in various contexts. There are some examples where guassian-splatting-like 3d scenes are created by comrpessing all the data into the weights of a nnet via a process similar to what I described to create a fully explorable 3d color scene that can be rendered from any angle.

alt Hacker News