Thanks for the reference to in-place merge sort. GitHub is having a lot problems right now, I cannot see your code. I will look at it when I get a chance.
I think I ignored Heap sort because it uses O(N) extra RAM, which is precious on a resource-constrained microcontroller.
Heap sort is in-place (and does not even need recursion, unlike quicksort). But yes, it is not stable, and usually slower than even shell sort (except for very large arrays).