Some valid points, but I hope the authors had developed them more.
On the semantic gap between the original software and its representation in the ITP, program extraction like in Rocq probably deserves some discussion, where the software is written natively in the ITP and you have to prove the extraction itself sound. For example, Meta Rocq did this for Rocq.
For the how far down the stack problem, there are some efforts from https://deepspec.org/, but it's inherently a difficult problem and often gets less love than the lab environment projects.