logoalt Hacker News

ejpiryesterday at 10:01 PM0 repliesview on HN

unfortunately the bigger models are pretty slow in token speed. The memory is just not that fast.

You can check what each model does on AMD Strix halo here:

https://kyuz0.github.io/amd-strix-halo-toolboxes/