You can fine-tune it so, given an image and a task description, it generates a corresponding set of actions.