logoalt Hacker News

10100810/01/20245 repliesview on HN

I understand the Realtime API voice novelty, and the techonological achievement it is, but I don't see it from the product point of view. It looks like one of those startups finding a solution before knowing the problem.

The two examples shown in the DevDay are the things I don't really want to do in the future. I don't want to talk to anybody, and I don't want to wait for their answer in a human form. That's why I order my food through an app or Whatsapp, or why I prefer to buy my tickets online. In the rare case I call to order food, it's because I have a weird question or a weird request (can I pick it up in X minutes? Can you prepare it in a different way?)

I hope we don't start seeing apps using conversations as interfaces because it would really horrible (leaving aside the fact that a lot of people don't know how to communicate themselves, different accents, sound environments, etc), while clicking or typing work almost the same for everyone (at least much more normalized than talking)


Replies

com2kid10/01/2024

> I understand the Realtime API voice novelty, and the techonological achievement it is, but I don't see it from the product point of view. It looks like one of those startups finding a solution before knowing the problem.

The market for realistic voice agents is huge, but also very fragmented. Customer service is the obvious example, large companies employ tens of thousands of customer service phone agents, and a large # of those calls can be handled, at least in part, with a sufficiently smart voice agent.

Sales is another, just calling back leads and checking in on them. Voice clone the original sales agent, give the AI enough context about previous interactions, and a lot of boring legwork can be handled by AI.

Answering simple questions is another great example, restaurants get slammed with calls during their busiest hours (seriously getting ahold restaurant staff during peak hours can be literally impossible!) having an AI that can pick up the phone and answer basic questions (what's in certain dishes, what is the current wait time, what is the largest group that can be sat together, etc) is super useful.

A lot of small businesses with only a single employee can benefit from having a voice AI assistant picking up the phone and answering the easy everyday queries and then handing everything else off to the owner.

The key is that these voice AIs should be seamless, you ask your question, they answer, and you ideally don't even know it is an AI.

show 2 replies
epolanski10/02/2024

I would love a work assistant, some sort of secretary idk I can talk to while I code.

"What are today's most important tasks? Anything I forgot before I log off? Can you write John to check the blocking PR? Let's fix this bug together".

corlinp10/01/2024

One thing I'm really excited for is having this real-time voice model in video game characters. It would be really cool to be able to have conversations with NPCs, and actually have to pick their brain for information about a quest or something.

show 1 reply
ilaksh10/01/2024

You're right, having a voice conversation for any reason is just so passe these days. They should stop adding microphones to phones and everything. So old-fashioned and inefficient. And who wants to ever have to actually talk to someone or some AI to ask for anything? I'm sure our vocal cords will evolve away soon. They are so primitive. Vestigial organs.

show 2 replies
bcherry10/01/2024

keep in mind that this is just v1 of the realtime api. they'll add realtime vision/video down the road which can also have wide applications beyond synchronous communication.