logoalt Hacker News

prmphyesterday at 10:29 PM1 replyview on HN

They all are. And once the context has rotted or been poisoned enough, it is unsalvageable.

Claude is now actually one of the better ones at instruction following I daresay.


Replies

XCSmeyesterday at 10:40 PM

In my tests it's worst with adding extra formatting or output: https://aibenchy.com/compare/anthropic-claude-opus-4-6-mediu...

For example, sometimes it outputs in markdown, without being asked to (e.g. "**13**" instead of "13"), even when asked to respond with a number only.

This might be fine in a chat-environment, but not in a workflow, agentic use-case or tool usage.

Yes, it can be enforced via structured output, but in a string field from a structured output you might still want to enforce a specific natural-language response format, which can't be defined by a schema.