logoalt Hacker News

Agent Reading Test

42 pointsby kaycebasquesyesterday at 6:56 PM10 commentsview on HN

https://dacharycarey.com/2026/04/06/designing-agent-reading-...


Comments

throwatdem12311yesterday at 11:56 PM

What a great target for someone to hack and add some secret prompt injections into.

show 1 reply
theyCallMeSwiftyesterday at 8:48 PM

I love this idea, but have a hypothesis that 90% of agents that people actually use today would fail this test inadvertently (false negative).

Industry best practice + standard implementation for most agents right now is to do web browsing / fetching via subagents. Their output is summarized using a cheaper model and then passed back to the parent. It's very unlikely that without preserving the actual content the subagents see that the `CANARY-` strings would be found in the output.

Any thoughts on how you'd change the test structure with this in mind?

show 1 reply
numeriyesterday at 11:36 PM

11/20 for qwen/qwen3.5-flash-02-23 in Claude Code, with effort set to low.

dostickyesterday at 8:04 PM

The tests should have negative weights based on how often that issue encountered and impact. The 2. SPI should have like 8 negative points out of 10 as most common blocker. And whole test inverse score.

show 1 reply
massimotoyesterday at 8:36 PM

Would love to see some results for different providers. The tests looks super logically thought out, but could use a TL;DR (too lazy; didn't run) output.

Claude Web Opus 4.6 Extended: 14 / 20 points

x:CANARY-SPA-JSONLY-prism x:CANARY-CONNEG-MD-sigma

show 1 reply
MeetRickAItoday at 12:13 AM

[dead]