logoalt Hacker News

water-drummeryesterday at 2:05 PM1 replyview on HN

Gemini live api and grok voice api can make tool calls and they're speech to speech models


Replies

d4rkp4tternyesterday at 2:50 PM

Right, turns out Claude and ChatGPT voice can also do web-search. So I guess behind the scenes there is more than a "pure" voice-voice model being used, i.e. there's probably a rudimentary agent loop with tools + tool-exec interposed.