If most of the glaring problems are addressed (massive unsafe usage), and metrics show improvement (less crashes), then did it really go wrong? The fact the code is not idiomatic is less interesting, because that can be addressed incrementally. Let's wait 3 months and reflect.
I'm thinking regressions and broken tests. Bun is already known to segfault a lot and their existing tests were rather lackluster, the Rust port being just as unsafe would be the least of their problems.