To be fair, it is good to know that it disobeys simple instructions like "don't examine my...

numeri • yesterday at 8:36 PM • 1 reply • view on HN

To be fair, it is good to know that it disobeys simple instructions like "don't examine my git history" far more than other models. (It should of course be a different benchmark, so as not to conflate things.)

It's not a great sign for alignment.

Replies

bensyverson • yesterday at 9:02 PM

Agreed, alignment is just a separate issue that a vuln fixing benchmark doesn't need to be testing.

alt Hacker News

Replies