Related from yesterday: Show HN: Gemini Pro 3 imagines the HN front page 10 years from now - https://news.ycombinator.com/item?id=46205632
Why not rank ESP for each HN user, with evidence?
Many people are impressed by this, and I can see why. Still, this much isn't surprising: the Karpathy + LLM combo can deliver quickly. But there are downsides of blazing speed.
If you dig in, there are substantial flaws in the project's analysis and framing, such as the definition of a prediction, assessing comments, data quality overall, and more. Go spelunking through the comments here and notice people asking about methodology and checking the results.
Social science research isn't easy; it requires training, effort, and patience. I would be very happy if Karpathy added a Big Flashing Red Sign to this effect. It would raise awareness and focus community attention on what I think are the hardest and most important aspects of this kind of project: methodology, rigor, criticism, feedback, and correction.
On the site itself:
it's great that this was produced in 1h with 60$. This is amazing to create small utilities, explore your curiosity, etc.
But the site is also quite confusing and messy. OK for a vibe coded experiment, sure, but wouldn't be for a final product. But I fear we're gonna see more and more of this. Big companies downsizing their tech departments and embracing vibe coded. Comparing to inflation, shrinkflation and skimpflation/ enshittification , will we soon adopt some word for this? AIflation? LLMflation?
And how will this comment score in a couple of years? :)
> Everything we do today might be scrutinized in great detail in the future because it will be "free".
s/"free"/stolen/
The bit about college courses for future prediction was just silly, I'm afraid: reminds me of how Conan Doyle has Sherlock not knowing Earth revolves around the Sun. Almost all serious study concerns itself with predicting, modelling and influence over the future behaviour of some system; the problem is only that people don't fucking listen to the predictions of experts. They aren't going to value refined, academic general-purpose futurology any more than they have in the past; it's not even a new area of study.
I'm delighted to see that one of the users who makes the same negative comments on every Google-related post gets a "D" for saying Waymo was smoke and mirrors. Never change, I guess.
This is a perfect example of the power and problems with LLMs.
I took the narcissistic approach of searching for myself. Here's a grade of one of my comments[1]:
>slg: B- (accurate characterization of PH’s “networking & facade” feel, but implicitly underestimates how long that model can persist)
And here's the actual comment I made[2]:
>And maybe it is the cynical contrarian in me, but I think the "real world" aspect of Product Hunt it what turned me off of the site before these issues even came to the forefront. It always seemed like an echo chamber were everyone was putting up a facade. Users seemed more concerned with the people behind products and networking with them than actually offering opinions of what was posted.
>I find the more internet-like communities more natural. Sure, the top comment on a Show HN is often a critique. However I find that more interesting than the usual "Wow, another great product from John Developer. Signing up now." or the "Wow, great product. Here is why you should use the competing product that I work on." that you usually see on Product Hunt.
I did not say nor imply anything about "how long that model can persist", I just said I personally don't like using the site. It's a total hallucination to claim I was implying doom for "that model" and you would only know that if you actually took the time to dig into the details of what was actually said, but the summary seems plausible enough that most people never would.
The LLM processed and analyzed a huge amount of data in a way that no human could, but the single in-depth look I took at that analysis was somewhere between misleading and flat out wrong. As I said, a perfect example of what LLMs do.
And yes, I do recognize the funny coincidence that I'm now doing the exact thing I described as the typical HN comment a decade ago. I guess there is a reason old me said "I find that more interesting".
[1] - https://karpathy.ai/hncapsule/2015-12-18/index.html#article-...
Now: compared to what? Is there a better source than HN? How's it compare to Reddit or lobsters?
Compared to what happens next? Does tptacek's commentary become market signal equivalent to the Fed Chair or the BLS labor and inflation reports?
Cool - now make it analyze all of those and come up with the 10 commandments of commenting factually and insightfully on HN posts...
One of the few use cases for LLMs that I have high hopes for and feel is still under appreciated is grading qualitative things. LLMs are the first tech (afaik) that can do top-down analysis of phenomena in a manner similar to humans, which means a lot of important human use cases that are judgement-oriented can become more standardized, faster, and more readily available.
For instance, one of the unfortunate aspects of social media that has become so unsustainable and destructive to modern society is how it exposes us to so many more people and hot takes than we have ability to adequately judge. We're overwhelmed. This has led to conversation being dominated by really shitty takes and really shitty people, who rarely if ever suffer reputational consequence.
If we build our mediums of discourse with more reputational awareness using approaches like this, we can better explore the frontier of sustainable positive-sum conversation at scale.
Implementation-wise, the key question is how do we grade the grader and ensure it is predictable and accurate?
Do we need more AI slop on the front page?
I am not sure if we need a karma precog analogue.
It does seem better than just upvotes and downvotes though.
> But if intelligence really does become too cheap to meter, it will become possible to do a perfect reconstruction and synthesis of everything. LLMs are watching (or humans using them might be). Best to be good.
I cannot believe this is just put out there unexamined of any level of "maybe we shouldn't help this happen". This is complete moral abdication. And to be clear, being "good" is no defense. Being good often means being unaligned with the powerful, so being good is often the very thing that puts you in danger.
dude, please do this for every year until today. This idea is actually amazing. If you need more money for API credits im sure people here could help donate.
Random Bets for 2035:
* Nvidia GPUs will see heavy competition and most chat-like use-cases switching to cheaper models and inference-specific-silicon but will be still used on the high end for critical applications and frontier science
* Most Software and UIs will be primarily AI-generated. There will be no 'App Stores' as we know them.
* ICE Cars will become niche and will be largely been replaced with EVs, Solar will be widely deployed and will be the dominate source of power
* Climate Change will be widely recognized due to escalating consequences and there will be lots of efforts in mitigations (e.g, Climate Engineering, Climate-resistant crops, etc).
[dead]
[dead]
Interesting experiment. Using modern LLMs to retroactively grade decade-old HN discussions is a clever way to measure how well our collective predictions age. It’s impressive how little time and compute it now takes to analyze something that would’ve required days of manual reading. My only caution is that hindsight grading can overvalue outcomes instead of reasoning — good reasoning can still lead to wrong predictions. But as a tool for calibrating forecasting and identifying real signal in discussions, this is a very cool direction.
Neat, I got a shout-out. Always happy to share the random stuff I remember exists!