logoalt Hacker News

eknkctoday at 2:25 PM0 repliesview on HN

I find time to first token more important then tok/s generally as these models wait an ungodly amount of time before streaming results. It looks like the claims are true based on M5: https://www.macstories.net/stories/ipad-pro-m5-neural-benchm... so this might work great.