These papers usually have poor stability to prompting and rerunning. It would be nice if we had some...

arjie • yesterday at 9:32 PM • 1 reply • view on HN

These papers usually have poor stability to prompting and rerunning. It would be nice if we had some kind of meta-evaluation metric where rewriting the prompt conditions or varying the input params could be used to determine how stable a result is.

Regardless, it's definitely true that AI agents have different priorities from us. That's what alignment is about anyway.

Replies

Chu4eeno • yesterday at 10:31 PM

It's probably because they care more about the headline than figuring anything out: https://github.com/kennethpayne01/project_kahn_public/issues...

So you create leading prompts like that, and re-run until you get a publishable session.

alt Hacker News

Replies