Fwiw if you trained an LLM in an RL sandbox that would require it to have goals, the output llm probably would "have goals"