I feel like I'm going nuts.
There are other commenters saying this is a good practice they've also done for other injuries. You are saying you are an actual radiologist and immediately clock the problems with its advice.
I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading. It is only when you do not know what the AI is being asked to do is it likely you will find the output helpful.
This is itself alarming to me, but no one else seems to find this to be quite damning for the AI services being offered, preferring instanced to be wowed by the convenience and speed at which they can be delivered unreviewed and unproven information.
Totally agree. I'm a scientist, and like most scientists I have some specialized skills that most of my colleages don't. AI has empowered them to learn and build things that they might have otherwise needed me for. But there have been quite a few cases where it led them very far down a wrong path. This has started happening way more often in the last few months.*
We've known since the beginning that AIs confidently say incorrect things. But now that they can speak confidently about very complex topics, and mostly say correct things, we are letting our guard down and lots of subtle falsehoods are slipping through.
*In one case, I was able to put things back on track because the AI suggested my colleague talk to me; somehow it figured out we were co-workers.
I see your argument, but it's not exactly news that an expert found a flaw in a popular tool. You could say the same about Wikipedia--experts have tons of issues with it, but Wikipedia still provides value to non-experts. The most likely alternative to Wikipedia for non-experts is simply not trying to learn anything new.
Similarly with LLMs, you can't just write them off entirely because they sometimes provide misleading or incorrect advice. The positive utility maximizing view is to learn when you need to call in an expert. I recently moved in to a new house and have used Claude extensively to figure out basic things (e.g., adjusting the garage door height, how to mount a TV). However, when the HVAC suddenly stopped working, I gave Claude a shot for an hour and tried some non-destructive fixes, but then realized I had to call in an HVAC expert.
> Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading
Yes, this is exactly so. AI is able to confidently sound plausible enough to convince laypersons or anyone who isn't very familiar with the subject matter, which is a big part of the mass-appeal "magic" of ChatGPT and other similar tools. It's like having a know-it-all friend (who also makes shit up to bridge their own knowledge gaps).
In many non-advanced non-specialized situations, AI is right enough to be at best useful or at worst not harmful (usually landing in the middle somewhere).
But speaking for myself, in areas where I consider myself quite proficient, I can very easily spot the subtle inconsistencies and naive conclusions that AI responses provide, and I have to guide/steer/correct it a lot to get good results when the subject matter is complex enough.
I dunno. I know a lot of software engineering experts. AI isn't always right, but neither are the people, and it's getting better and better.
Software is one domain where it excels because of structured training data and simulation environments, so I'm well aware it's better here than other areas.
Still there's somewhere balanced between saying every time it's "insufficient or incomplete or outright misleading" and "just trust AI". AI's a useful source of information/reasoning/research, but know you need to validate it's answers for important decisions.
I may be missing something, but I think it's unclear that the parent poster here is necessarily actually contradicting anything the AI said. It may depend on the exact information the OP wrote to Claude and GPT. The full transcripts would be needed. (Though there is definitely a separate point that a doctor would generally better know all the right questions to ask, while current LLMs may be making certain assumptions.)
The LLM may have, from its "perspective", implicitly thought the OP was telling it that he had strong reason to believe there was no calcification and was not considering the bigger picture of possibly receiving an incomplete/poor assessment from the medical staff. In fact, the issue here may be the LLM overly trusting doctors vs. trusting its own expertise.
> no one else seems to find this to be quite damning for the AI services being offered, preferring instanced to be wowed by the convenience and speed at which they can be delivered unreviewed and unproven information
"Be wowed by the convenience and speed", or merely "take advantage of the mere availability"? What most people find to be damning about expert advice is that they simply can't get it anywhere, at any cost that they can afford.
Well that's part of the problem. AI is not accountable - if you take its advice and hurt yourself, who is responsible?
A real doctor is accountable.
They might both "know" a lot of things but implicitly the party who is accountable is going to be more trustworthy.
And I don't see that going away until AI companies must be licensed for application x and can lose their license / be sued if engaging in malpractice.
Seems natural enough. There will always be complexity and nuance that is missed by an AI model or person - the world is just super detailed. The more expertise you have the more you will be aware of that nuance. That doesn't mean the model or person is not useful as a starting point.
> I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading. It is only when you do not know what the AI is being asked to do is it likely you will find the output helpful.
I always recommend people try asking LLMs a lot of questions on something they know first. Programmers should start by asking LLMs to work on a codebase they’re familiar with first.
You’re overstating the problem, though. Even for an expert the LLM will get a lot of things right and can be helpful under a watchful eye.
The real problem is knowing how to identify when it’s on the right track and when you need to correct it, because both cases are presented with the same tone and confidence.
An expert can better identify when the LLM output doesn’t sound plausible. Someone unfamiliar with the topic will think everything it says looks correct.
You shouldn’t expect frontier models to work on medical imaging. There is much more that goes into building a medical imaging product. First and foremost is data. Medical imaging datasets are not prevalent one the public internet at the scale necessary to have good performance on medical imaging tasks especially MRI. Also the labels are super noisy.
This is completely different than asking for general medical reasoning which is more derived from papers, public standards and textbooks.
Text exists at the right scale but images don’t.
>I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading
media is awash at the moment with experts chiming in to support AI, saying their fields are being revolutionized, etc.
it seems unsurprising to me that the laymen opinion would follow the loudest media trumpets.
On the flip side of this problem, novel best practices lag the medical standard of care, other human failures like corruption and competing priorities notwithstanding.
For example, we had to advocate for certain practices during the birth of our first child that became routine during our second several years later.
So, neither side is guaranteed correct, doctor or citizen researcher (which did not include LLMs in my case, for the record). The truest answer is also the most useless one, applicable to all fields: it depends.
The real question is: if you embrace being a layman, whom do you trust more: LLMs/the internet or experts, like doctors? I think the answer is pretty clearly experts.
The question is how far is AI off compared to the professional that we have access to. World best experts are not accessible to most of us. :(
You're not. This site was also bullish on using LLMs as therapists, which defeats the very point of them, and reflects a lack of knowledge on what exactly therapists do for people.
More on topic: if the article's author arrived at a definitively negative result would this have shown up on HN?
This is true in broader contexts too. Bunch of experts can't agree on something fundamental which is hard to prove/ disprove, and they have strong opinions on the topic.
AI is much worse.
No, not anytime someone is an actual expert at anything, AI output appears insufficient. That is why experts in various fields use AI.
Then to say "Aha, but all of that is AI psychosis" makes obviously no sense: Why would we trust experts when they offer critique but not when they say "this is helpful"?
Overall: People are not insane. AI makes mistakes and, often, fails completely. AI also helps them do things better, quicker, increasingly so. The jaggedness of AI is confusing and real.
I came here to post this as my experience. AI is magical when I apply it to something I know nothing about. It far exceeds my expectations every single time. I know nothing, but here is a report with animated graphics explaining exactly what I asked it to explain!
In fields where I'm an expert... it makes a lot of silly mistakes that are annoying and I feel like they would just cascade if I didn't correct them early. (I still think it's a net win, but... I watch it and it watches me, and we both do better work. I'd even apply the "magical" adjective when it does stuff I hate but know how to do, like edit Helm charts. What would normally be 20 minutes of me griping about YAML indentation is just a correct diff in seconds. I'll take it!)
So with that in mind, I tend to distrust output that I can't verify. If a doctor was recommending surgery and I thought the plan was too aggressive, I'd get a second opinion. I don't expect Claude Code to have much medical diagnostic ability, as that is really not what the model is trained for, and I know how it performs on work that it's trained and fine-tuned for. That is not to say the output is wrong and that it can't have diagnostic value, just that I personally wouldn't feel safe trusting it. Wrap up the same model with fine-tuning in the domain and a harness that reminds Claude to do a lot of sanity checks, perhaps with a human in the loop to guide it back onto the rails when it gets hyperfixated on something that doesn't matter? That could very much be a useful AI product.
Yes. The PM’s “with AI I know enough to be dangerous, haha” means “I’m actually dangerous and I don’t realize”
> Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading.
The term for when the press "gets it wrong" is Gell-Mann Amnesia (https://en.wiktionary.org/wiki/Gell-Mann_Amnesia_effect).
In that case, when you have personal knowledge of the facts, or know the specific domain area, you can see where the reporter mixed things up.
AI is no different, it's just a bunch of matrix math substituting for "the reporter" regurgitating what it was previously told. So the Gell-Mann Amnesia effect would apply just the same. If you have domain knowledge, you immediately see where the AI got it wrong. When you do not have domain knowledge, you have less chance of seeing where the AI was wrong.
> I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading
AI assistant are industrializing the Gell-Mann amnesia effect.
AI is an expert in everything you are not.
> I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading.
AI isn't even the first instance of this phenomenon, news articles are like this as well.
>AI output appears insufficient or incomplete or outright misleading
It has been like this since the rise of "AI". The only people enthusiastic about it are usually the ones hoping to make a profit in one way or another.
TFA doesn’t actually state where the bit about shockwave therapy came from and it wasn’t the main point of the article. The concern was about being given useless therapies. The homeopathic analgesic is concerning, at least to me.
I.e. nothing this radiologist said was related to the LLM’s advice.
Your instinct is correct, and in a lot of cases it's true. However, I've heard from enough doctors by now (a cardiologist, psychiatrist, and epidemiologist/former physician) that they use medical LLMs and find them extremely helpful, mostly as a way to either bring up knowledge they'd forgotten about or as a way to learn something new and then verify it. I'm extremely skeptical about LLMs in general and the connection to Gell-Mann Amnesia is apt, but I wouldn't necessarily write them off completely like that. There are experts using the models that find them genuinely helpful in their field.
It's like reading news articles. Seems reasonable until you read an article about something you know, then you see how wrong they can be.
We're past the point of Gell-Mann amnesia. This is full blown Gell-Mann psychosis.
LLM is not necessarily an expert system. Once there are expert systems for law, healthcare, accounting, governance…
This is natural and even logically expected. It's just Gell-Mann amnesia in action. The world has more people spouting on things than it has people knowledgeable in said things.
Apply that to the Internet at large, and realize where LLMs got their training. They're basically ConfidentlyIncorrect personified.
what is happening is that the gap between what the experts and AI know is getting smaller each year. this year sure radiologists are mocking AI's ability to interpret MRI results, but they are a lot better at that this year than last. In five years perhaps radiologists will truly appreciate AI, but I am not holding my breath because radiologists are notoriously slow to adapt to changes in medical science compared to other specialists like anesthesiologists or surgeons
> This is itself alarming to me, but no one else seems to find this to be quite damning for the AI services being offered, preferring instanced to be wowed by the convenience and speed at which they can be delivered unreviewed and unproven information.
Welcome to the club? This new awareness you've found over the true quality of LLM based GenAI output has been what "all the haters" have been mad about for-ever. That the output of LLMs are clearly defective, and merely have found a cute trick towards making humans think they're less defective than they are actually measured to be.
And the corresponding anger and frustration to push the risks of genai output out onto others, while also aggressively pushing it as a feature you should be using already. You're behind don't you know, and whatever other lie I have to tell to trick you into enough FOMO to pay me 200USD/mo so I can sell FOSS back to you.
An LLM can only output the mean next likely token, and then add a bunch of extra noise on top of that so it feels interesting and not repetitive. None of this is new, the problem is, 50% of humans are below the mean, but have no idea. So when an LLM tells them some lie: well, it sounds so helpful! It's impossible for someone who sounds this helpful to lie to me, liars never sound confident! It must be PERFECT! I'm gonna tell everyone how perfect it is. so the bottom 0-33% think LLMs are fantastic tools that make nearly 0 mistakes in comparison to the bottom 33%. 33-66%-ish aren't sure, some times it's great, but it will make that random mistake sometimes, but I can catch most (or all of them depending on ego). and the 66%+ are angry about how many people are getting tricked by something so obviously low quality, or are lucky enough to not have to care.
[dead]
This is the root of AI psychosis. There’s a lot of unpack here, and I won’t go too deep because you can’t really have a discussion with affected folks because their fundamental basis is not evidence, it’s belief.
It is weirdly religious in a way, because if you were to present contrary evidence (e.g. experts in a field weighing in about how plausible sounding responses are bunk), you would only be told you don’t believe enough in the long term potential and capabilities.
Don’t get me wrong, I think we all agree capabilities will eventually improve (and farther-future capabilities could reasonably surpass experts), but really is unclear if the current transformer architectures with their probabilistic/hallucinatory outputs will plateau before they surpass current experts abilities in all promised fields.