Would it really be infeasible to take a sample and do a search over an indexed training set? Maybe a...

efskap • today at 8:34 AM • 1 reply • view on HN

Would it really be infeasible to take a sample and do a search over an indexed training set? Maybe a bloom filter can be adapted

Replies

hexaga • today at 9:26 AM

It's not the searching that's infeasible. Efficient algorithms for massive scale full text search are available.

The infeasibility is searching for the (unknown) set of translations that the LLM would put that data through. Even if you posit only basic symbolic LUT mappings in the weights (it's not), there's no good way to enumerate them anyway. The model might as well be a learned hash function that maintains semantic identity while utterly eradicating literal symbolic equivalence.

alt Hacker News

Replies