I came back to try the Hassaan one, it was much more realistic although he still denied wearing a hat. I think if you were able to run a still image of the character’s appearance through a multimodal LLM and have it generate a description for the conversation’s prompt it would work better.
This is a good suggestion, I’ll work on this!