I can tell you that it's not that he's setting aside speed -- the fact that it's as fast as it is is an achievement. But there is a degree of unavoidable overhead -- IIRC his goal is to get it down to 20-30% for most workloads, but beyond that you're running into the realities of runtime bounds checks, materializing the flight ptrs, etc.
20% to 30% slower would be amazing for all the extra runtime work that is required in my limited understanding. This would be good enough for a whole lot of serious applications.