Well it seems that, those days, instead of SUM(expense1,expense2) you ask an LLM to "make an app that will compute the total of multiple expenses".
If I read most of the news on this very website, this is "way more efficient" and "it saves time" (and those who don’t do it will lose their job)
Then, when it produces wrong output AND it is obvious enough for you to notice, you blame the hardware.
I mean, Apple's LLM also doesn't work on this device, plus the author compared the outputs from each iterative calculation on this device vs. others and they diverge from every other Apple device. That's a pretty big sign that both, something is different about that device, and this same broken behavior carried across multiple OS versions. Is the hardware or the software "responsible" - who knows, there's no smoking gun there, but it does seem like something is genuinely wrong.
I don't get the snark about LLMs overall in this context; this author uses LLM to help write their code, but is also clearly competent enough to dig in and determine why things don't work when the LLM fails, and performed an LLM-out-of-the-loop debugging session once they decided it wasn't trustworthy. What else could you do in this situation?
The author is debugging the tensor operations of the on-device model with a simple prompt. They confirmed the discrepancy with other iPhone models.
It’s no different than someone testing a calculator with 2+2. If it gets that wrong, there’s a hardware issue. That doesn’t mean the only purpose of the calculator is to calculate 2+2. It is for debugging.
You could just as uncharitably complain that “these days no one does arithmetic anymore, they use a calculator for 2+2”.