You raise a really interesting point. I'm sure it's just missed my notice, but I'm not familiar with any projects from antediluvian AI that have been resurrected to run on modern hardware and see where they'd really asymptote if they'd had the compute they deserved.
To be fair, usually those projects would need considerable work to be ported to modern multicore machines, let alone GPUs.
This paper “were RNNs all we needed?” explores this hypothesis a bit, finding that some pre-transformer sequence models can match transformers when trained at appropriate scale. Though they did have to make some modifications to unlock more parallelism
https://arxiv.org/abs/2410.01201