Memory is the limitation, M5 has larger memory options. So large language model could be used.

aegis_camera • today at 4:58 PM • 1 reply • view on HN

Replies

Context is your limitation, on the M5. The larger your model is, the longer you'll be waiting on token prefill. TFTT with 0 tokens of context isn't a real-world benchmark.

That's why most professional inference solutions reach for GPU-heavy hardware like the Jetson. Apple Silicon seems like a strange and overly expensive fit for this use cae.

➕ show 2 replies

alt Hacker News

Replies