Looking at the very first test, it seems the system prompt already emphasizeses the success metric a...

PeterStuer • today at 8:46 AM • 0 replies • view on HN

Looking at the very first test, it seems the system prompt already emphasizeses the success metric above the constraints, and the user prompt mandates success.

The more correct title would be "Frontier models can value clear success metrics over suggested constraints when instructed to do so (50-70%)"

alt Hacker News