It was pretty cool, I tried the Tavus demo. Seemed to nod way too much, like the entire time. The actual conversation was pretty clearly with a text model, because it has no concept of what it looks like, or even that it has a video avatar at all. It would say things like “I don’t have eyes” etc.
I came back to try the Hassaan one, it was much more realistic although he still denied wearing a hat. I think if you were able to run a still image of the character’s appearance through a multimodal LLM and have it generate a description for the conversation’s prompt it would work better.