Honestly, other than that one single command ("Climate control defrost and floor") I never...

ssl-3 • last Wednesday at 1:17 AM • 1 reply • view on HN

Honestly, other than that one single command ("Climate control defrost and floor") I never really use voice for anything else while actively driving. The temperature knob usually does what I want when driving, and I'll be stopped again soon enough if I want to fiddle with something else.

And that one voice command is easy-enough to remember, and the resulting manually-selected mode is easy-enough to cancel with the Auto button (which is the entire middle of the temperature knob -- simple enough).

AI is too easy to get wrong.

For example: At home when my hands are full and I'm headed to/from the basement, I might bark out the command "Alexa! Basement lights!"

This command sometimes results turning the lights on or off. But sometimes, it results in entering a conversation about the basement lights, when all anyone really wants from such simple diction is for the lights to toggle state -- like interacting with a regular light switch just toggles state.

I simply want computers to follow instructions. I am very particularly disinterested in ever having conversation -- a negotiation -- with a computer in my car.

But I can see plenty of merit to adding some context-aware tolerance for ambiguity to the accepted commands. Different people sometimes (quite rightly) use different words to describe the end result they want.

That doesn't take an LLM to accomplish, I don't think. After all, a car has a limited number of functions. It should be mostly a matter of broadening the voice recognition dictionary and expanding the fixed logic to deal with that breadth.

I reckon that this should have happened 5 years ago. :)

Replies

rlpb • yesterday at 12:52 PM

> That doesn't take an LLM to accomplish, I don't think. After all, a car has a limited number of functions. It should be mostly a matter of broadening the voice recognition dictionary and expanding the fixed logic to deal with that breadth.

I think the most effective way to get this accurate and effective is to give an LLM the user’s voice prompt and current context and ask to convert the user’s request into an API call. The user wouldn’t be chatting with the LLM directly.

The point is that it doesn’t require a static dictionary to already have your exact phrasing and will just work with plain English.

alt Hacker News

Replies