Can confirm, my experience in “loop engineering” was “this is neat” for 45 minutes until a daily ration of tokens was evaporated. The quadratic cost trap is prohibitive to experimentation.
As a localLLM evangelist, I am hopeful this will bring more attention to the joys of rolling your own sovereign AI.
Yeah, i'm hoping that gets smoother. I've been experimenting with omlx and opencode on my m5x64gb and keep running into issues w/ Qwen3.6-35B-A3B-MLX-8bit exceeding it's memory limit at the most inopportune times. Playing with 12B gemma4 (8bit) more today.
Maybe I should be aiming for something targeting 48gb of memory?