Metrics and toy examples can be gamed. Rather than these silly examples, how does it feel?
Can you replace Claude Code Opus or Codex with this?
Does it feel >80% as good on "real world" tasks you do on a day to day basis.