comparisons will be run when the quality of generation will be on pair with other available models. It is useless to have preformance if the quality is not at lease on par.
The paper runs a bench (code and bench in the paper) to compare the performance with a causal attention GPT-2 model (nanoGPT) at inference (20% faster) and at training (equivalent for T and D larger than a threshold).