logoalt Hacker News

turnsouttoday at 6:57 PM0 repliesview on HN

It seems to depend on FlashAttention, so the short answer is no. Hopefully someone does the work of porting the inference code over!