Perhaps use a small, fast LLM to maintain a rolling "disposition" state, and for each of perhaps a handful of dispositions, have a handful of bridging emotes/gestures. You can have the small LLM use the next-to-last/second-most-recent user input to control the disposition async'ly, and in moments where it's not clear just say "That's a good question," "Let me think about that," or "I think that..." etc.