I was a student intern in a parallel computing research group around that first reference point of 1995. My career went other ways, working more on distributed systems instead of programming language theory or implementation.
But, when I encountered OpenCL and CUDA about ten years ago, I was struck by just how much these were delivering the SPMD parallel programming model in finished products. Around 1995, these were often C dialects with some wonky compiler that each research group just barely kept together. By 2015, they were just bundled up inside a graphics driver or similarly commoditized runtime environment.
Also, the GPU of 2015 was delivering the throughput we dreamed of in supercomputers back then. A teraFLOP went from a strategic theme to something you could deploy to your desktop.