logoalt Hacker News

lostmsuyesterday at 1:13 PM1 replyview on HN

So no comparison?


Replies

tunedtoday at 6:24 AM

comparisons will be run when the quality of generation will be on pair with other available models. It is useless to have preformance if the quality is not at lease on par.

The paper runs a bench (code and bench in the paper) to compare the performance with a causal attention GPT-2 model (nanoGPT) at inference (20% faster) and at training (equivalent for T and D larger than a threshold).