This makes me think about how large would an FPGA-based system to be able to do this? Obviously there is no single-chip FPGA that can do this kind of job, but I wonder how many we would need.
Also, what if Cerebras decided to make a wafer-sized FPGA array and turned large language models into lots and lots of logical gates?