So prompt goes in 4x as fast but generates tokens slower. I'd take that tradeoff. On m...

petercooper • yesterday at 7:09 PM • 0 replies • view on HN

So prompt goes in 4x as fast but generates tokens slower.

I'd take that tradeoff. On my M3 Ultra, the inference is surprisingly fast, but the prompt processing speed makes it painful except as a fallback or experimentation, especially with agentic coding tools.

alt Hacker News