Founder here.
1. Programming GPUs is a problem. The ratio of CPUs to CPU programmers and GPUs to GPU programmers is massively out of whack. Not because GPU programming is less valuable or lucrative, because GPUs are weird and the tools are weird.
2. We are more interested in leveraging existing libraries than running existing binaries wholesale (mostly within a warp). But, running GPU-unaware code leaves a lot of space for the compiler to move stuff around and optimize things.
3. The compiler changes are not our product, the GPU apps we are building with them are. So it is in our interest to make the apps very fast.
Anyway, skepticism is understandable and we are well aware code wins arguments.
Do you foresee this being faster than SIMD for things like cosine similarity? Apologies if I missed that context somewhere.
> because GPUs are weird and the tools are weird.
Why is it also that terminology is so all over the place. Subgroups, wavefronts, warps etc. referring to the same concept. That doesn't help it.
[dead]
> The ratio of CPUs to CPU programmers and GPUs to GPU programmers is massively out of whack.
These days I just ask an LLM to write my optimized GPU routines.
> the GPU apps we are building with them are
I can't help but get the feeling you have use-case end-goal in mind that's opaque to many of us who are gpu-ignorant.
It could be helpful if there were an example of the type of application that would be nicer to express through your abstractions.
(I think what you've shown so far is super cool btw)