I wouldn't wait. fpgas weren't design to serve this model architecture. yes they are very power efficient but the layout/p+r overhead, the memory requirement (very few on-the-market fpgas have hbm), slower clock speed, and just an unpleasant developer experience makes it a hard sell.