logoalt Hacker News

noosphrtoday at 12:03 AM1 replyview on HN

I've been building Ai apps since gpt 3 so 5 years now.

The pro AI people don't understand what quadratic attention means and the anti-ai people don't understand how much information can be contained in a tb of weights.

At the end of the day both will be hugely disappointed.

>The best asset you can develop is an intuition for what works and what doesn't, and getting that intuition requires months if not years of personal experimentation.

Intuition does not translate between models. Whatever you think dense llms were good at deepseek completely upended it in an afternoon. The difference between major revisions of model families is substantial enough that intuition is a drawback not an asset.


Replies

simonwtoday at 12:17 AM

What does quadratic attention mean?

I've so far found that intuition travels between models of a similar generation remarkably well. The conformance suite trick (find a 9,200 test existing conformance suite and tell an agent to build a fresh implementation that passes all those tests) I first found with GPT-5.2 turned out to work exactly as well against Claude Opus 4.5, for example.

show 1 reply