logoalt Hacker News

dataviz1000today at 6:25 PM1 replyview on HN

Sorry to spam, I'm working on this also from a different angle. Hopefully sharing adds to the conversation.

First, about the loop, Claude's (coding agent) context and attention is big enough to self-reflect. Agent Tuning shows a technique that not only demonstrates this but a way quantify it. [0] The difference is autoresearch's val_bpb measures what the agent built; Agent Tuning's p̂ measures the agent itself.

> Claude's attention doesn't distinguish between "instructions I'm writing" and "instructions I'm following" -- they're both just tokens in context.

Second, doing research, finding academic research to add to context helps. Here is an example of an implementation that creates trading strategies by reading research and recreating them in creative new ways. [1]

The biggest problem is the coding agents don't "Fail fast and loud". They fail deceivingly.

[0] https://github.com/adam-s/agent-tuning

[1] https://github.com/adam-s/alphadidactic


Replies

mkageniustoday at 8:01 PM

> The biggest problem is the coding agents don't "Fail fast and loud". They fail deceivingly.

GPT 2 and 3 used to fail fast (and loud coz we could easily see it lying)

show 1 reply