It’s amazing how much you get wrong here. As LLM attention layers are stacked goal functions.
What they lack is multi turn long walk goal functions — which is being solved to some degree by agents.