logoalt Hacker News

bdbdbdbtoday at 12:23 PM8 repliesview on HN

> No human could read all of this in a lifetime. AI consumes it in seconds.

And therefore it's impossible to test the accuracy if it's consuming your own data. AI can hallucinate on any data you feed it, and it's been proven that it doesn't summarize, but rather abridges and abbreviates data.

In the authors example

> "What patterns emerge from my last 50 one-on-ones?" AI found that performance issues always preceded tool complaints by 2-3 weeks. I'd never connected those dots.

Maybe that's a pattern from 50 one-on-ones. Or maybe it's only in the first two and the last one.

I'd be wary of using AI to summarize like this and expecting accurate insights


Replies

gchamonlivetoday at 12:34 PM

> it's been proven that it doesn't summarize, but rather abridges and abbreviates data

Do you have more resources on that? I'd love to read about the methodology.

> And therefore it's impossible to test the accuracy if it's consuming your own data.

Isn't it only if it's hard to verify the result? If it's a result that's hard to produce but easy to verify, a class which many problems fall into, you'd just need to look at the synthetized results.

If you ask it "given these arbitrary metrics, what is the best business plan for my company?" It'd be really hard to verify the result. I'd be hard to verify the result from anyone for that matter, even specialists.

So I think it's less about expecting the LLM to do autonomous work and more about using LLMs to more efficiently help you search the latent space for interesting correlations, so that you and not the LLM come up with the insights.

show 1 reply
missedthecuetoday at 7:03 PM

"AI can hallucinate on any data you feed it, and it's been proven that it doesn't summarize, but rather abridges and abbreviates data."

Have you ever met a human? I think one of the biggest reasons people become bearish on AI is that their measure of whether it's good/useful is that it needs to be absolutely perfect, rather than simply superior to human effort.

show 2 replies
kenjacksontoday at 12:46 PM

Similar to P/NP, verification can often be faster than solving. For example, you can then ask the AI to give you the list of tool complaints and the performance issues. Then a text search can easily validate the claim.

novoktoday at 7:22 PM

AI is a new kind of bulk tool, you need to know how to use it well and context management is a huge part of it. For that 1-1 example, you would do a for loop with new context with subagents or a literal for loop for example to prevent the 'first two and last one' issue. Then with those 1-1 summaries, look at that to make the determination for example.

Humanity has gotten amazing results from unreliable stochastic processes, managing humans in organizations is an example of that. It's ok if something new is not completely deterministic to still be incredibly useful.

TimBytetoday at 5:11 PM

I think as long as you keep a skeptical loop and force the model to cite or surface raw notes, it can still be useful without being blindly trusted

potsandpanstoday at 4:53 PM

> ...and it's been proven that it doesn't summarize, but rather abridges and abbreviates data.

I don't really know what this means, or if the distinction is meaningful for the majority of cases.

show 1 reply
xtiansimontoday at 1:08 PM

> “I'd be wary of using AI to summarize like this and expecting accurate insights.”

Sure, but when do you have accurate results when using an iterative process? It can happen at the beginning or at the end when you’re bored, or have exhausted your powers of interrogation. Nevertheless, your reasoning will tell you if the AI result is good, great, acceptable, or trash.

For example, you can ask Chat—Summarize all 50 with names, dates and 2-3 sentence summaries and 2-3 pull quotes. Which can be sufficient to jog your memory, and therefore validate or invalidate the Chat conclusion.

That’s the tool, and its accuracy is still TBD. I for one am not ready to blindly trust our AI overlords, but darn if a talking dog isn’t worth my time if it can make an argument with me.

block_daggertoday at 12:35 PM

Your colleagues using the tech will be far ahead of you soon, if they aren’t already.

show 2 replies