logoalt Hacker News

aegis_cameratoday at 4:58 PM1 replyview on HN

Memory is the limitation, M5 has larger memory options. So large language model could be used.


Replies

bigyabaitoday at 5:03 PM

Context is your limitation, on the M5. The larger your model is, the longer you'll be waiting on token prefill. TFTT with 0 tokens of context isn't a real-world benchmark.

That's why most professional inference solutions reach for GPU-heavy hardware like the Jetson. Apple Silicon seems like a strange and overly expensive fit for this use cae.

show 2 replies