logoalt Hacker News

PeterStuertoday at 8:46 AM0 repliesview on HN

Looking at the very first test, it seems the system prompt already emphasizeses the success metric above the constraints, and the user prompt mandates success.

The more correct title would be "Frontier models can value clear success metrics over suggested constraints when instructed to do so (50-70%)"