In all my unpublished tests, which focus on 1. unique logic puzzles that are intentionally adjacent ...

fellowniusmonk • last Thursday at 6:30 PM • 1 reply • view on HN

In all my unpublished tests, which focus on 1. unique logic puzzles that are intentionally adjacent to existing puzzles and 2. implementing a specific unique CRDT algorithm that is not particularly common but has an official reference implementation on github (so the models definitely been trained on it) I find that 5.2 overfits to the more common implementation and will actively break working code and puzzles.

I find it to be incorrectly pattern matching with a very narrow focus and will ignore real documented differences even when explicitly highlighted in the prompt text (this is X crdt algo not Y crdt algo.)

I've canceled my subscription, the idea that on any larger edits it will just start wrecking nuance and then refuse to accept prompts that point this out is an extremely dangerous form of target fixation.

Replies

pillefitz • last Thursday at 7:06 PM

How does Claude perform?

➕ show 1 reply

alt Hacker News

Replies