Thread locals do solve the problem. You create a wrapper around the original function. You set a global thread local user data, you pass in a function which calls the function pointer accepting the user data with the global one.
Yep. Thread locals are probably faster than the other solutions shown too.
It’s confusing to me that thread locals are “not the best idea outside small snippets” meanwhile the top solution is templating on recursion depth with a constexpr limit of 11.
Thread locals don't fully solve the problem. They work well if you immediately call the closure, but what if you want to store the closure and call it later?
Because we create `reverse_sort` between creating `normal_sort` and calling it, we end up with a reverse sort despite clearly asking for a normal sort.