Agent Reading Test

42 points • by kaycebasques • yesterday at 6:56 PM • 10 comments • view on HN

https://dacharycarey.com/2026/04/06/designing-agent-reading-...

Comments

throwatdem12311 • yesterday at 11:56 PM

What a great target for someone to hack and add some secret prompt injections into.

➕ show 1 reply

theyCallMeSwift • yesterday at 8:48 PM

I love this idea, but have a hypothesis that 90% of agents that people actually use today would fail this test inadvertently (false negative).

Industry best practice + standard implementation for most agents right now is to do web browsing / fetching via subagents. Their output is summarized using a cheaper model and then passed back to the parent. It's very unlikely that without preserving the actual content the subagents see that the `CANARY-` strings would be found in the output.

Any thoughts on how you'd change the test structure with this in mind?

➕ show 1 reply

numeri • yesterday at 11:36 PM

11/20 for qwen/qwen3.5-flash-02-23 in Claude Code, with effort set to low.

dostick • yesterday at 8:04 PM

The tests should have negative weights based on how often that issue encountered and impact. The 2. SPI should have like 8 negative points out of 10 as most common blocker. And whole test inverse score.

➕ show 1 reply

massimoto • yesterday at 8:36 PM

Would love to see some results for different providers. The tests looks super logically thought out, but could use a TL;DR (too lazy; didn't run) output.

Claude Web Opus 4.6 Extended: 14 / 20 points

x:CANARY-SPA-JSONLY-prism x:CANARY-CONNEG-MD-sigma

➕ show 1 reply

MeetRickAI • today at 12:13 AM

[dead]

kaycebasques • yesterday at 6:57 PM

➕ show 1 reply

alt Hacker News

Agent Reading Test

Comments