Most of the logic can be reused, but the FP64 multiplier is up to 4 times larger. Also some shifters are up to 2 times larger (because they need more stages, even if they shift the same number of bits). Small size increases occur in other blocks.
Even so, the multipliers and shifters occupy only a small fraction of the total area, a fraction that is smaller then implied by their number of gates, because they have very regular layouts.
A reduction from the ideal 1:2 FP64/FP32 throughput to 1:4 or in the worst case to 1:8 should be enough to make negligible the additional cost of supporting FP64, while still keeping the throughput of a GPU competitive with a CPU.
The current NVIDIA and AMD GPUs cannot compete in FP64 performance per dollar or per watt with Zen 5 Ryzen 9 CPUs. Only Intel B580 is better in FP64 performance per dollar than any CPU, though its total performance is exceeded by CPUs like 9950X.