Yes, an amazing and detailed post, enjoyed all of it. In AI, it is common to use jit compilers (pytorch, jax, warp, triton, taichi, ...) that compile to cuda (or rocm, cpu, tpu, ...). You could write renderers like that, rasterizers or raytracers.
For example: https://github.com/StafaH/mujoco_warp/blob/render_context/mu...
(A new simple raytracer that compiles to cuda, used for robotics reinforcement learning, renders at up to 1 million fps at low resolution, 64x64, with textures, shadows)