logoalt Hacker News

llm_trw02/23/20250 repliesview on HN

Because is the bitter lesson were true no one would be wasting their time with convolutions or attention blocks. You'd just replace them with the general tensor that allows every hyper relation possible between all points instead.