logoalt Hacker News

crystal_revengeyesterday at 9:25 PM4 repliesview on HN

Definitely mirrors my experience. One heuristic I've often used when providing context to model is "is this enough information for a human to solve this task?". Building some text2SQL products in the past it was very interesting to see how often when the model failed, a real data analyst would reply something like "oh yea, that's an older table we don't use any more, the correct table is...". This means the model was likely making a mistake that a real human analyst would have without the proper context.

One thing that is missing from this list is: evaluations!

I'm shocked how often I still see large AI projects being run without any regard to evals. Evals are more important for AI projects than test suites are for traditional engineering ones. You don't even need a big eval set, just one that covers your problem surface reasonably well. However without it you're basically just "guessing" rather than iterating on your problem, and you're not even guessing in a way where each guess is an improvement on the last.

edit: To clarify, I ask myself this question. It's frequently the case that we expect LLMs to solve problems without the necessary information for a human to solve them.


Replies

adiabatichottubyesterday at 11:15 PM

A classic law of computer programming:

"Make it possible for programmers to write in English and you will find that programmers cannot write in English."

It's meant to be a bit tongue-in-cheek, but there is a certain truth to it. Most human languages fail at being precise in their expression and interpretation. If you can exactly define what you want in English, you probably could have saved yourself the time and written it in a machine-interpretable language.

kevin_thibedeauyesterday at 9:55 PM

Asking yes no questions will get you a lie 50% of the time.

adriandyesterday at 10:13 PM

I have pretty good success with asking the model this question before it starts working as well. I’ll tell it to ask questions about anything it’s unsure of and to ask for examples of code patterns that are in use in the application already that it can use as a template.

hobsyesterday at 10:14 PM

The thing is, all the people cosplaying as data scientists don't want evaluations, and that's why you saw so little in fake C level projects, because telling people the emperor has no clothes doesn't pay.

For those actually using the products to make money well, hey - all of those have evaluations.

show 1 reply