It's been fun benchmarking AI investigations at botsbench.com . Part of it is checking for thes...

lmeyerov • today at 6:18 PM • 0 replies • view on HN

It's been fun benchmarking AI investigations at botsbench.com . Part of it is checking for these kinds of issues - we recently started seeing contamination in our first generation challenge, and less obvious, agent sandbox escapes for other kinds of cheating. Fun times!

alt Hacker News