I tried Gemma 4 A4B and was surprised how hart it is to use it for agentic stuff on a RTX 4090 with 24gb of ram.
Balancing KV Cache and Context eating VRam super fast.