logoalt Hacker News

stemlordlast Friday at 3:24 PM1 replyview on HN

If true I wonder what kind of feedback loop is happening by training on human behavior that's directly influenced by the output of the same model


Replies

adam_patarinolast Friday at 3:31 PM

We build our fine tuning and reinforcement pipeline at cortex.build by synthesizing interactions between a user, the agent loops, and a codebase. The exact data they get from users in Claude Code.

That data is critical to improve tool call use (both in correctness but also to improve when the agent chooses to use that tool). It's also important for the context rewrites Claude does. They rewrite your prompt and continuously manage the back-and-forth with the model. So does Cortex, just more aggressively with a more powerful context graph.