IMO we are either limited by data or reaching the limits of what's possible with a transformer architecture. Hardware will get us efficiency but I am not sure if it will lead to smarter models