> can avoid or defer a lot of the expected memory allocations of async operations
Is this true in realistic use cases or only in minimal demos? From what I've seen, as soon as your code is complex enough that you need two compilation units, you need some higher level async abstraction, like coroutines.
And as soon as you have coroutines, you need to type-erase both the senders and the scheduler, so you have at least couple of allocations per continuation.
I have only played very little with it and I don't have anything in production yet, so I can't say.
I did manage to co_await senders without additional allocations though (but the code I wrote is write-only so I need to re-understand what I did 6 months ago again).
I recall that I was able to allocate the operation_state as a coroutine-local, and the scheduler is node based, so the operation_state can just be appended to the scheduler queue.
You still do need to allocate the coroutine itself, and I haven't played with coroutine allocators yet.
I can't comment on this particular implementation but few years back I played around with a similar idea, so not quite 1-to-1 mapping, but my conclusion derived from the experiments was the same - it is allocation-heavy. Code was built around similar principles, with type-erasure on top of future-promises (no coroutines back then in C++), and work-stealing queues. Code was quite lean, although not as feature-packed as folly, and there was nothing obvious to optimize apart from lock-contention, and dynamic allocations. I did couple of optimizations on both of those ends back then and it seemed to confirm the hypothesis of heavy allocations.