JFYI: Your inverse ray direction calculation is not NaN-safe: if rays are completely axis-parallel in one dimension, so the direction value is 0.0 for that axis, you'll be doing the val / 0.0 which results in a NaN...
Also, as you're using full double/f64-precision all the time, you're leaving a fair bit of performance on the table: transcendentals (sin(), cos(), etc) in particular - can be a lot slower than when using f32, and generally double precision can be special-cased to particular areas of the renderer that need it (curve, sphere intersection, and some situations where volume scattering produces very small distances).
What's the proper way to handle a zero in the direction vector when calculating the reciprocal direction? Should it evaluate to infinity?
> Also, as you're using full double/f64-precision all the time, you're leaving a fair bit of performance on the table
There's another issue that popped up on my quick naive profiling run: std::shared_ptr<Material> in the HitRecord/HittableLightSample is assigned/copied and destroyed a lot, and somehow these refcount operations show up as half of all samples on my profile (presumably because even if there's no hit and the pointer stays nullptr, the smart pointer still must check if there's anything to deallocate).