logoalt Hacker News

creamyhorroryesterday at 7:48 PM5 repliesview on HN

I've only used 5.4 for 1 prompt (edit: 3@high now) so far (reasoning: extra high, took really long), and it was to analyse my codebase and write an evaluation on a topic. But I found its writing and analysis thoughtful, precise, and surprisingly clearly written, unlike 5.3-Codex. It feels very lucid and uses human phrasing.

It might be my AGENTS.md requiring clearer, simpler language, but at least 5.4's doing a good job of following the guidelines. 5.3-Codex wasn't so great at simple, clear writing.


Replies

torginusyesterday at 11:23 PM

Honestly, while I'd like to believe you, there's always a post about how $MODEL+1 delivered powerful insights about the very nature of the universe in precise Hegelian dialectic, while $MODEL's output was indistinguishable from a pack of screeching sexually frustrated bonobos

dana321yesterday at 11:43 PM

5.4 very high didn't notice in my codebase a glaring issue that drops all data being sent around the network.

samptonyesterday at 9:52 PM

That's been my experience as well switching from Opus to Codex. Reasoning takes longer but answers are precise. Claude is sloppy in comparison.

show 2 replies
irishcoffeeyesterday at 9:19 PM

> It might be my AGENTS.md requiring clearer, simpler language

If you gave the exact same markdown file to me and I posted ed the exact same prompts as you, would I get the same results?

show 2 replies
pembrookyesterday at 10:40 PM

The latest research these days is that including an AGENTS.md file only makes outcomes worse with frontier models.

show 5 replies