logoalt Hacker News

hebejebelusyesterday at 9:13 PM1 replyview on HN

Seems like a postprocess step on the initial output would fix that kind of thing - maybe a small 'thinking' step that transforms the initial output to match style.


Replies

Mirasteyesterday at 9:26 PM

Yeah, that's how it would be implemented after a filter fail, but it's important that the filter itself be separate from the agent, so it can be deterministic. Some problems, like "genuine," are so baked in to the models that they will persist even if instructed not to, so a dumb filter, a la a pre-commit hook, is the only way to stop it consistently.