I'm looking forward to running a Gemma 4 turboquant on my 24GB GPU. The perf looks impressive for how compact it is.
I often get a 10x more cost effective run processing on my local hardware.
Still reaching for frontier models for coding, but find the hosted models on open router good enough for simple work.
Feels like we are jumping to warp on flops. My cores are throttled and the fiber is lit.