Humans have goal seeking behavior. LLMs don’t. You could maybe call the combination of LLMs and the RL-based harnesses somewhat “intelligent” in aggregate, but the problem is that it’s not “general” intelligence like these labs want to argue, since it’s by definition only good for the set of problems the RL part has been trained to solve, which is a subset of programming problems.