Does it support flash attention? Use tensor cores? Can I write custom kernels? UPD. found no evide...

lostmsu • last Friday at 1:56 AM • 1 reply • view on HN

Does it support flash attention? Use tensor cores? Can I write custom kernels?

UPD. found no evidence that it supports tensor cores, so it's going to be many times slower than implementations that do.

mikepapadim • last Friday at 8:32 AM

Yes, when you use the PTX backend it supports Tensor Cores.It has also implementation for flash attention. You can also write your own kernels, have a look here: https://github.com/beehive-lab/GPULlama3.java/blob/main/src/... https://github.com/beehive-lab/GPULlama3.java/blob/main/src/...

➕ show 1 reply

alt Hacker News