Okay, but why is the Siri team sitting out transformers. I really wanna move past the „Dragon Naturally Speaking“ experience with a bolted on decision tree.
Not sure "sitting out" is the right way to put it. They've been publicly trying to ship a next-gen Siri for years and haven't been able to get something good enough to release. The latest plan is to base it on Gemini so we should be seeing progress on that next month at WWDC.
Gemini will be replacing the legacy Siri:
https://blog.google/company-news/inside-google/company-annou...
They did create a chatbot version of Siri small enough to run locally, but decided that hallucinations were a big enough issue to push the release.
The experience of using LLMs as digital assistants so far is not great. Gemini on Android sucks so bad it's hard to describe. It can't tell what its own capabilities are, it can't inspect the states of the apps it manipulates, it hallucinates constantly, and it needs more handholding than the crappy old decision tree to do the right thing. I much more often have to pull over to make sure Google Maps is doing the right thing than I ever used to before, because trusting the LLM to be "smarter" so often fails for me.
Be careful what you wish for.
in my experience voice recognition on Claude (using the iPhone app) isn't that great -- maybe even worse than Siri (I'm referring to voice transcription specifically, not the inference of course)
Because the competitor voice models sound good but are dumb upon any scrutiny
ChatGPT’s voice model has a great user experience and seems like it is seamlessly integrated into the chat, but its actually a far smaller and dumber model. @husk.irl on instagram has videos displaying how dumb and undiscerning it is
People were wowed by the magic at one point, but its faded. Apple avoids those things and the limitations havent been solved
I think they could never make it good enough at the right price.
You have to remember all of the AI companies are making cash bonfires. People aren't going to stop buying iPhones because Siri can only do what it does now.
If Apple focuses on hardware and skips the pay-for-inference bubble they'll come out the other side with the best consumer hardware everybody already has for local inference which is going to eat the whole industry's lunch.
nvidia is going to have a hard time convincing people they need to buy $1000 LLM inference hardware. Apple isn't going to have a hard time convincing people to buy the next generation of phone/tablet/laptop.
I think it's the same reason why MacOS and iOS degraded a lot in terms of UX the past decade. The focus of Apple shifted towards hardware independence.
The 2010s was marked by Intel's lazy product lineup, year after year pumping rehashes of older products, iterating on top of their 14nm lithography with increasingly minor improvements on its architecture until AMD overcame them. In the process, Apple's partnership with Intel became a liability it had to solve, and a push for the unified ARM architecture was no small feat.
If you ask me I don't think it's justified to degrade the user experience for the sake of focusing on this. It's a trillion dollar company, and has been for a while. Sure it could have tackled both, but what do I know.
In any case I think it explains really well why Siri feels so abandoned.
Who’s doing it better? I have yet to hear from a Google or Amazon user who has a transformatively better experience, and I think that’s why they haven’t jumped so far because they have hundreds of millions of users who have daily habits that they don’t want to lightly disturb.