logoalt Hacker News

efskaptoday at 8:34 AM1 replyview on HN

Would it really be infeasible to take a sample and do a search over an indexed training set? Maybe a bloom filter can be adapted


Replies

hexagatoday at 9:26 AM

It's not the searching that's infeasible. Efficient algorithms for massive scale full text search are available.

The infeasibility is searching for the (unknown) set of translations that the LLM would put that data through. Even if you posit only basic symbolic LUT mappings in the weights (it's not), there's no good way to enumerate them anyway. The model might as well be a learned hash function that maintains semantic identity while utterly eradicating literal symbolic equivalence.