Thank you! I realize now that I was thinking about a different aspect of systems research, but failed to say so.
Barrelfish (multikernel) and your username made me think of manycore systems and the scheduling challenges we will surely face as systems become more heterogeneous. I'm in a period of trying to learn more about that. Any and all recommendations are much appreciated.
Jim Keller's Tenstorrent ($1B funding to date) is shipping $1K PCIe manycore accelerators, with open-but-immature software, https://www.theregister.com/2024/08/27/tenstorrent_ai_blackh...
> compute.. is handled by 140 of Tenstorrent's Tensix cores, each of which is composed of five "Baby RISC-V" cores, a pair of routers, a compute complex, and some L1 cache.. Tensix cores account for 700 of the 752 so-called baby RISC-V cores on board.. TT-Metalium low-level programming model.. kernels themselves are plain C++ with APIs.. Tenstorrent aims to support running any AI model on its accelerators using commonly used runtimes like PyTorch, ONNX, JAX, TensorFlow, and vLLM.
Legion from the Stanford research team that lead to CUDA, https://legion.stanford.edu/ & https://elliottslaughter.com/2024/02/legion-paper-history
> A novel mapping interface provides explicit programmer controlled placement of data in the memory hierarchy and assignment of tasks to processors in a way that is orthogonal to correctness, thereby enabling easy porting and tuning of Legion applications to new architectures.. Legion is developed as an open source project, with major contributions from LANL, NVIDIA Research, SLAC, and Stanford.