logoalt Hacker News

ramses0today at 3:39 PM1 replyview on HN

I'd held off from buying a new personal laptop for quite a few years and felt that the M5-128gb was justifiable once I started really seeing payoffs from using AI at work.

Running w/ Cursor and doing some "nights and weekends" type coding / conversations, I was hitting $100-200 of usage within a few weeks. I know there's probably better ways to manage costs, but I was getting enough value out of it to keep bumping my spend limit from $20 => $40 => $80 => $120 (and then I stopped spending! :-)

Messing around with local-llm, I've settled on `omlx` and `gemma` for "conversational", and I think it's `qwen-120b-a3b-6bit` or something for the "heavy hitter". Gemma "gets it" a lot more, whereas that particular `qwen` tends to fall into the "MuSt WrItE CoOooDeee!" behaviour in a lot of cases instead of holding a conversation, and does an awesome job of randomly spitting out ascii-art diagrams or including full-blown bash shell scripts to illustrate different cases.

My POV is: "Local for slightly slower/casual usage", the ~1% of battery usage per minute of LLM is shockingly accurate (eg: 30 minutes == 30% drop!). "Gemma for discussion and emitting DESIGN-... docs", and "Qwen for converting DESIGN-... to PLAN-...", (as well as implementation, but generally from a fresh context loading the relevant PLAN-... or supporting docs)

...then supplement that with direct Cursor usage in case I screw up some setting on being able to get the local LLM working, or if I need to include literal web-research or really having access to some SOTA model. Using the pi-coder harness locally, web pages are kindof a difficult conundrum as they can be kindof gigantic and are really worthy of special casing, some sort of sub-harness, etc... but the more "stuff" you put into the agent, the less context window (and memory!) you have available, so it's a real balancing act.

The other biggest problem is that you're limited (locally) to ~20-80tps and in some cases you have to chew on or "swallow" the whole prompt up to that point if you end up with some sort of cache miss (TTFT). The `omlx` server does a pretty good job (after you tweak some settings and stuff) of allowing MANY prompt continuations to nearly immediately start generated tokens, but sometimes if I have two agents going (eg: Gemma talking shit about Qwen's output or vice versa) in a longer context window, then you'll take that hit.

"Other people's compute" is definitely more freeing, but even looking at $200/mo usage that's $2400 vs. the ~$6k for a maxed out MBP. Call it $2500 vs. $7500 and you'd say that "local AI gives you a 3-year amortization window for a slower, worse experience" ... but if you're strategic about your usage, the ability to "talk for free" and occasionally "burst" to an online provider or having some hugging-face tokens to try out different models that you can't quite run locally is really nice. Talking to the AI (locally) to even just do non-coding planning without worrying about data leakage or privacy issues is phenomenal, and you end up owning a really nice laptop!

In some ways, seeing the "advantage" of having the local 128gb capacity for LLM, I'm semi-wishing I'd have gotten a mac mini instead, but then I can't quite do the 100% offline stuff (eg: coffee-shop) that the maxed out laptop allows.

If it were a mini running locally, I'd feel more comfortable calling it the always-on "AI brain" to process my emails, run crontab summaries, whatever kindof "open-claw-ish" stuff that you could do w/o relying on having to "keep the laptop lid open all the time". I'm sure there's ways to repurpose things, but longer-term, call it even 3-5 years from now... any sort of 128gb machine will be more than capable where you'd want to have one "doing stuff" locally within your home network (IMHO).


Replies

chrisweeklytoday at 4:19 PM

Thank you! That was a generous and helpful response, I really appreciate it. Food for thought...

>"...if you're strategic about your usage, the ability to "talk for free" and occasionally "burst" to an online provider or having some hugging-face tokens to try out different models that you can't quite run locally is really nice. Talking to the AI (locally) to even just do non-coding planning without worrying about data leakage or privacy issues is phenomenal, and you end up owning a really nice laptop!"

^ this resonates, loudly.