Abrash did this in Quake because those divides are _Free_ when intervened with other code. Pentium FPU is pipelined, you can push FDIV, then FXCH to another data and do something else for a while instead of waiting for the result. The price is hand tuned assembly code that works fast only on Intel FPU in 1996. AMD caught up in 1998-99 finally implementing pipelined FDIV and 0 cycle FXCH.
OMG that link (and its parent) is extremely interesting! Thank for sharing!