a heap-free implementation could be a really cool direction to explore. thanks!
i think you might be interested in MLIR/IREE: https://github.com/openxla/iree