We have been able to automatically inline functions for a few decades now. You can even override inlining decisions manually, though that's usually a bad idea unless you're carefully profiling.
Also, it's pointer indirection in data structures that kills you, because uncached memory is brutally slow. Function calls to functions in the cache are normally a much smaller concern except for tiny functions in very hot loops.
I'm not sure Rust's `async fn` desugaring (which involves a data structure for the state machine) is inlineable. (To be precise: maybe the desugared function can be inlined, but LLVM isn't allowed to change the data structure, so there may be extra setup costs, duplicate `Waker`s, etc.) It's probably true that there is a performance cost. But I agree with the article's point that it's generally insignificant.
For non-async fns, the article already made this point:
> In release mode, with optimizations enabled, the compiler will often inline small extracted functions automatically. The two versions — inline and extracted — can produce identical assembly.