A FP64 unit can share most of two FP32 units.
Only the multiplier is significantly bigger, up to 4 times. Some shifters may also be up to twice bigger. The adders are slightly bigger, due to bigger carry-look-ahead networks.
So you must count mainly the area occupied by multipliers and shifters, which is likely to be much less than 10%.
There is an area increase, but certainly not of 50% (300 m^2). Even an area increase of 10% (e.g. 60-70 mm^2 for the biggest GPUs seems incredibly large).
Reducing the FP64/FP32 throughput ratio from 1:2 to 1:4 or at most to 1:8 is guaranteed to make the excess area negligible. I am sure that the cheap Intel Battlemage with 1:8 does not suffer because of this.
Any further reductions, from 1:16 in old GPUs until 1:64 in recent GPUs cannot have any other explanation except the desire for market segmentation, which eliminates small businesses and individual users from the customers who can afford the huge prices of the GPUs with FP64 support.