logoalt Hacker News

sailingparrottoday at 4:37 PM0 repliesview on HN

Everything can be represented as f(), a full scale SotA transformer model is also just f(context). That does not mean one layer is sufficient. It all depends on the level of expressivity required by this f to be a good model.