logoalt Hacker News

kittikittiyesterday at 5:53 PM0 repliesview on HN

This is why I run my own models. All the inference providers do sneaky things behind the scenes. They will limit the output tokens, turn off attention layers, lower reasoning, or just use a completely different model. I'm actually surprised that Claude Code experienced this, as I've experienced this the least from API and coding agents.