The few times I've used LLMs as question answering engines for anything moderately technical, t...

snackbroken • yesterday at 7:39 PM • 8 replies • view on HN

The few times I've used LLMs as question answering engines for anything moderately technical, they've given subtly-but-in-important-ways incorrect information such that taking them at face value would've likely lost me hours or days of pursuing something unworkable, even when I ask for references. Whether or not the "references" actually contain the information I'm asking for or merely something tangentially related has been rather hit or miss too.

The one thing they've consistently nailed has been tip-of-my-tongue style "reverse search" where I can describe a concept in sufficient detail that they can tell me the search term to look it up with.

Replies

bdunks • yesterday at 9:37 PM

Absolutely. And I’m finding the same with “agent” coding tools. With the ever increasing hype around Cursor I tried to give it a go this week. The first 5 minutes were impressive, when I sent a small trial ballon for a simple change.

But when asking for a full feature, I lost a full day trying to get it to stop chasing its tail. I’m still in the “pro” free trial period so it was using a frontier model.

This was for a Phoenix / Elixir project; which I realize is not as robustly in the training data as other languages and frameworks, but it was supposedly consuming the documentation, other reference code I’d linked in, and I’d connected the Tidewave MCP.

Regardless, in the morning with fresh eyes and a fresh cup of coffee, I reverted all the cursor changes and implemented the code myself in a couple hours.

viccis • yesterday at 10:39 PM

>The one thing they've consistently nailed has been tip-of-my-tongue style "reverse search" where I can describe a concept in sufficient detail that they can tell me the search term to look it up with.

This is basically the only thing I use it for. It's great at it, especially given that Google is so terrible these days that a search describing what you're trying to recall gets nothing. Especially if it involves a phrase heavily associated with other things.

For example "What episode of <X show> did <Y thing> happen?" In the past, Google would usually pull it up (often from reddit discussion), but now it just shows me tons of generic results about the show.

milesvp • today at 1:20 AM

Yes, you have to be very careful when querying LLM's, you have to assume that they are giving you sort of the average answer to a question. I find them very good at sort of telling me how people commonly solve a problem. I'm lucky, in that the space I've been working has had a lot of good forums training data, and the average solution tends to be on the more correct side. But you still have to validate nearly everything it tells you. It's also funny to watch the tokenization "fails". When you ask about things like register names, and you can see it choose nonexisting tokens. Atmel libraries have a lot of things like this in them

#define PA17_EIC_LINE PIN_PA17A_EIC_EXTINT_NUM #define PA17_EIC_BIT PORT_PA17A_EIC_EXTINT1 #define PA17_PMUX_INDEX 8 //pa17 17/2 #define PA17_PMUX_TYPE MUX_PA17A_EIC_EXTINT1

And the output will be almost correct code, but instead of an answer being:

PORT_PA17A_EIC_EXTINT1

you'll get:

PORT_PA17A_EIC_EXTINT_NUM

and you can tell that it diverged trying to use similar tokens, and since _ follows EXTINT sometimes, it's a "valid" token to try, and now that it's EXTINT_ now NUM is the most likely thing to follow.

That said, it's massively sped up the project I'm working on, especially since Microchip effectively shut down the forums that chatgpt was trained on.

astee • today at 5:36 AM

It's interesting because I use them every day all day for this now.

You have to "gut check" the answers and know when to go deeper.

A lot of answers are low stakes, and it's OK to be a little wrong if it helps go in the right direction.

SeanDav • today at 8:09 AM

I agree. Use with caution. One of my personal pet peeves with LLM answers is their propensity to give authoritative or definite answers, when in fact they are best guesses, and sometimes pure fantasy.

jwr • today at 2:58 AM

Try perplexity — I found it to be very good at digging up information. It became a nearly complete replacement for web searches at this point.

➕ show 2 replies

cortesoft • today at 4:25 AM

I was trying to set up a solr cluster on kubernetes the other day, and was googling how to create a new collection.

Google AI helpfully showed me this awesome CRD that created exactly what I wanted... sadly, there is no such CRD in reality

smohare • yesterday at 7:51 PM

[dead]

alt Hacker News

Replies