logoalt Hacker News

llm_trwtoday at 2:42 AM0 repliesview on HN

Because is the bitter lesson were true no one would be wasting their time with convolutions or attention blocks. You'd just replace them with the general tensor that allows every hyper relation possible between all points instead.