This is definitely a good idea! I think the hard part is making it contextual and relevant to the last question/response, in which case the LLM comes into the equation again. Something we're looking at though!
Perhaps use a small, fast LLM to maintain a rolling "disposition" state, and for each of perhaps a handful of dispositions, have a handful of bridging emotes/gestures. You can have the small LLM use the next-to-last/second-most-recent user input to control the disposition async'ly, and in moments where it's not clear just say "That's a good question," "Let me think about that," or "I think that..." etc.
Perhaps use a small, fast LLM to maintain a rolling "disposition" state, and for each of perhaps a handful of dispositions, have a handful of bridging emotes/gestures. You can have the small LLM use the next-to-last/second-most-recent user input to control the disposition async'ly, and in moments where it's not clear just say "That's a good question," "Let me think about that," or "I think that..." etc.