2025: The Year in LLMs

621 points • by simonw • yesterday at 11:54 PM • 325 comments • view on HN

Comments

All these improvement in a single year, 2025. While this may seem obvious to those who follows along the AI / LLM news. It may be worth pointing out again ChatGPT was introduced to us in November 2022.

I still dont believe AGI, ASI or Whatever AI will take over human in short period of time say 10 - 20 years. But it is hard to argue against the value of current AI, which many of the vocal critics on HN seems to have the opinion of. People are willing to pay $200 per month, and it is getting $1B dollar runway already.

Being more of a Hardware person, the most interesting part to me is the funding of all the developments of latest hardware. I know this is another topic HN hate because of the DRAM and NAND pricing issue. But it is exciting to see this from a long term view where the pricing are short term pain. Right now the industry is asking, we have together over a trillion dollar to spend on Capex over the next few years and will even borrow more if it needs to be, when can you ship us 16A / 14A / 10A and 8A or 5A, LPDDR6, Higher Capacity DRAM at lower power usage, better packaging, higher speed PCIe or a jump to optical interconnect? Every single part of the hardware stack are being fused with money and demand. The last time we have this was Post-PC / Smartphone era which drove the hardware industry forward for 10 - 15 years. The current AI can at least push hardware for another 5 - 6 years while pulling forward tech that was initially 8 - 10 years away.

I so wished I brought some Nvidia stock. Again, I guess no one knew AI would be as big as it is today, and it is only just started.

➕ show 7 replies

andai • today at 9:06 AM

Re: yolo mode

I looked into docker and then realized the problem I'm actually trying to solve was solved in like 1970 with users and permissions.

I just made a agent user limited to its own home folder, and added my user to its group. Then I run Claude code etc as the agent user.

So it can only read write /home/agent, and it cannot read or write my files.

I add myself to agent group so I can read/write the agent files.

I run into permission issues sometimes but, it's pretty smooth for the most part.

Oh also I gave it root to a $3 VPS. It's so nice having a sysadmin! :) That part definitely feels a bit deviant though!

➕ show 1 reply

ogou • today at 3:07 AM

This is a good tooling survey of the past year. I have been watching it as a developer re-entering the job market. The job descriptions closely parallel the timeline used in the post. That's bizarre to me because these approaches are changing so fast. I see jobs for "Skill and Langchain experts with production-grade 0>1 experience. Former founders preferred". That is an expertise that is just a few months old and startups are trying to build whole teams overnight with it. I'm sure January and February will have job postings for whatever gets released that week. It's all so many sand castles.

➕ show 1 reply

waldrews • today at 1:03 AM

Remember, back in the day, when a year of progress was like, oh, they voted to add some syntactic sugar to Java...

➕ show 3 replies

didip • today at 2:38 AM

Indeed. I don't understand why Hacker News is so dismissive about the coming of LLMs, maybe HN readers are going through 5 stages of grief?

But LLM is certainly a game changer, I can see it delivering impact bigger than the internet itself. Both require a lot of investments.

➕ show 12 replies

timonoko • today at 11:41 AM

OpenSCAD-coding has improved significantly on all models. Now syntax is always right and they understand the concept of negative space.

Only problem is that they don't see connection between form and function. They may make teapot perfectly but don't understand that this form is supposed to contain liquid.

mrheosuper • today at 8:54 AM

I'm not against AI/LLM(in fact, i am quite supportive to it). But one of my biggest fear is overusing AI. We may introduce some tool that only "AI/LLM" can resonably do(Like tool with weird, convoluted UI/UX, syntax) and no one against it because AI/LLM can use/interact.

Then genAI, It's become more and more difficult to tell which is AI and which is not, and AI is in everywhere. I dont know what to think about it. "If you can't tell, does it matter ?"

mmcnl • today at 12:46 PM

Let's hope 2026 will also have interesting innovations not related to AI or LLMs.

AndyNemmity • today at 12:50 AM

These are excellent every year, thank you for all the wonderful work you do.

➕ show 1 reply

the_mitsuhiko • today at 1:12 AM

> The (only?) year of MCP

I like to believe, but MCP is quickly turning into an enterprise thing so I think it will stick around for good.

➕ show 4 replies

apolloartemis • today at 6:14 AM

Thank you for your warning about the normalization of deviance. Do you think there will be an AI agent software worm like NotPetya which will cause a lot of economic damage?

➕ show 1 reply

mark_l_watson • today at 11:44 AM

Thanks Simon, great writeup.

It has been an amazing year, especially around tooling (search, code analysis, etc.) and surprisingly capable smaller models.

syndacks • today at 2:56 AM

I can’t get over the range of sentiment on LLMs. HN leans snake oil, X leans “we’re all cooked” —- can it possibly be both? How do other folks make sense of this? I’m not asking for a side, rather understanding the range. Does the range lead you to believe X over Y?

➕ show 11 replies

rr808 • today at 10:21 AM

What happened to Devin? 2024 it was a leading contender now it isn't even included in the big list of coding agents.

➕ show 3 replies

lopatin • today at 6:47 AM

The "pelicans on a bike" challenge is pretty wide spread now. Are we sure it's still not being trained on?

➕ show 1 reply

lukaslalinsky • today at 6:14 AM

Speaking of asynchronous agents, what do people use? Claude Code for web is extremely limited, because you have no custom tools. Claude Code in GitHub Actions is vastly more useful, due to the custom environment, but ackward to use interactively. Are there any good alternatives?

➕ show 4 replies

Gud • today at 10:25 AM

What about self hosting?

huqedato • today at 11:58 AM

I completely disagree with the idea that 2025 "The (only?) year of MCP." In fact, I believe every year in the foreseeable future will belong to MCP. It is here to stay. MCP was the best (rational, scalable, predictable) thing since LLM madness broke loose.

politelemon • today at 11:19 AM

> The problem is that the big cloud models got better too—including those open weight models that, while freely available, were far too large (100B+) to run on my laptop.

The actual, notable progress will be models that can run reasonably well on commodity, everyday hardware that the average user has. From more accessibility will come greater usefulness. Right now the way I see it, having to upgrade specs on a machine to run local models keeps it in a niche hobbyist bubble.

agentifysh • today at 1:34 AM

What an amazing progress in just short time. The future is bright! Happy New Year y'all!

npalli • today at 1:17 AM

Great summary of the year in LLMs. Is there a predictions (for 2026) blogpost as well?

➕ show 2 replies

websiteapi • today at 1:40 AM

I'm curious how all of the progress will be seen if it does indeed result in mass unemployment (but not eradication) of professional software engineers.

➕ show 4 replies

vanderZwan • today at 3:00 AM

Speaking of new year and AI: my phone just suggested "Happy Birthday!" as the quick-reply to any "Happy New Year!" notification I got in the last hours.

I'm not too worried about my job just yet.

➕ show 2 replies

fullstackchris • today at 10:14 AM

> The reason I think MCP may be a one-year wonder is the stratospheric growth of coding agents. It appears that the best possible tool for any situation is Bash—if your agent can run arbitrary shell commands, it can do anything that can be done by typing commands into a terminal.

I push back strongly from this. In the case of the solo, one-machine coder, this is likely the case - if you're exposing workflows or fixed tools to customers / collegues / the web at large via API or similar, then MCP is still the best way to expose it IMO.

Think about a GitHub or Jira MCP server - commandline alone they are sure to make mistakes with REST requests, API schema etc. With MCP the proper known commands are already baked in. Remember always that LLMs will be better with natural language than code.

ashishgupta2209 • today at 9:11 AM

2026: The Year of Robots, note it for next year

andrewinardeer • today at 3:24 AM

Thank you. Enjoyed this read.

AI slop videos will no doubt get longer and "more realistic" in 2026.

I really hope social media companies plaster a prominent banner over them which screams, "Likely/Made by AI" and give us the option to automatically mute these videos from our timeline. That would be the responsible thing to do. But I can't see Alphabet doing that on YT, xAI doing that on X or Meta doing that on FB/Insta as they all have skin in the video gen game.

➕ show 3 replies

sanreau • today at 1:11 AM

> Vendor-independent options include GitHub Copilot CLI, Amp, OpenHands CLI, and Pi

...and the best of them all, OpenCode[1] :)

[1]: https://opencode.ai

➕ show 4 replies

yupyupyups • today at 9:10 AM

Let's talk about the societal cost these models have had on us including their high energy cost and the proliferation of auto-generated slop media used to milk ad revenue, scam people, SEO farm, do propaganda or automate trolling. What about these big corporations collecting an astronomical amount of debt to hoard DRAM and NAND in a way that has crippled the PC market within weeks? And what are they going to do next, put a few dollars in Trump's pocket so that they can rob/loot the US population through bailouts? Who gets to keep all the hardware I wonder?

Nvidia, Samsung, SK Hynix and some other voltures I forgot to mention are making serious bank right now.

sho_hn • today at 1:38 AM

Not in this review: Also the record year in intelligent systems aiding in and prompting human users into fatal self-harm.

Will 2026 fare better?

➕ show 4 replies

blutoot • today at 3:12 AM

I hope 2026 will be the year when software engineers and recruiters will stop the obsession with leetcode and all other forms of competitive programming bullshit

aussieguy1234 • today at 1:25 AM

> The year of YOLO and the Normalization of Deviance #

On this including AI agents deleting home folders, I was able to run agents in Firejail by isolating vscode (Most of my agents are vscode based ones, like Kilo Code).

I wrote a little guide on how I did it https://softwareengineeringstandard.com/2025/12/15/ai-agents...

Took a bit of tweaking, vscode crashing a bunch of times with not being able to read its config files, but I got there in the end. Now it can only write to my projects folder. All of my projects are backed up in git.

➕ show 1 reply

smileson2 • today at 2:27 AM

forgot to mention the first murder-suicide instigated by chatgpt

➕ show 1 reply

Razengan • today at 6:50 AM

My experience with AI so far: It's still far from "butler" level assistance for anything beyond simple tasks.

I posted about my failures to try to get them to review my bank statements [0] and generally got gaslit about how I was doing it wrong, that I if trust them to give them full access to my disk and terminal, they could do it better.

But I mean, at that point, it's still more "manual intelligence" than just telling someone what I want. A human could easily understand it, but AI still takes a lot of wrangling and you still need to think from the "AI's PoV" to get the good results.

[0] https://news.ycombinator.com/item?id=46374935

----

But enough whining. I want AI to get better so I can be lazier. After trying them for a while, one feature that I think all natural-language As need to have, would be the ability to mark certain sentences as "Do what I say" (aka Monkey's Paw) and "Do what I mean", like how you wrap phrases in quotes on Google etc to indicate a verbatim search.

So for example I could say "[[I was in Japan from the 5th to 10th]], identify foreign currency transactions on my statement with "POS" etc in the description" then the part in the [[]] (or whatever other marker) would be literal, exactly as written, but the rest of the text would be up to the AI's interpretation/inference so it would also search for ATM withdrawals etc.

Ideally, eventually we should be able to have multiple different AI "personas" akin to different members of household staff: your "chef" would know about your dietary preferences, your "maid" would operate your Roomba, take care of your laundry, your "accountant" would do accounty stuff.. and each of them would only learn about that specific domain of your life: the chef would pick up the times when you get hungry, but it won't know about your finances, and so on. The current "Projects" paradigm is not quite that yet.

compass_copium • today at 6:08 AM

>I’m still holding hope that slop won’t end up as bad a problem as many people fear.

That's the pure, uncut copium. Meanwhile, in the real world, search on major platforms is so slanted towards slop that people need to specify that they want actual human music:

https://old.reddit.com/r/MusicRecommendations/comments/1pq4f...

DrewADesign • today at 2:07 AM

You’re absolutely right! You astutely observed that 2025 was a year with many LLMs and this was a selection of waypoints, summarized in a helpful timeline.

That’s what most non-tech-person’s year in LLMs looked like.

Hopefully 2026 will be the year where companies realize that implementing intrusive chatbots can’t make better ::waving hands:: ya know… UX or whatever.

For some reason, they think its helpful to distractingly pop up chat windows on their site because their customers need textual kindergarten handholding to … I don’t know… find the ideal pocket comb for their unique pocket/hair situation, or had an unlikely question about that aerosol pan release spray that a chatbot could actually answer. Well, my dog also thinks she’s helping me by attacking the vacuum when I’m trying to clean. Both ideas are equally valid.

And spending a bazillion dollars implementing it doesn’t mean your customers won’t hate it. And forcing your customers into pathways they hate because of your sunk costs mindset means it will never stop costing you more money than it makes.

I just hope companies start being honest with themselves about whether or not these things are good, bad, or absolutely abysmal for the customer experience and cut their losses when it makes sense.

➕ show 4 replies

techpression • today at 2:22 AM

Nothing about the severe impact on the environment, and the hand waviness about water usage hurt to read. The referenced post was missing every single point about the issue by making it global instead of local. And as if data center buildouts are properly planned and dimensioned for existing infrastructure…

Add to this that all the hardware is already old and the amount of waste we’re producing right now is mind boggling, and for what, fun tools for the use of one?

I don’t live in the US, but the amount of tax money being siphoned to a few tech bros should have heads rolling and I really don’t want to see it happening in Europe.

But I guess we got a new version number on a few models and some blown up benchmarks so that’s good, oh and of course the svg images we will never use for anything.

➕ show 1 reply

ishashankmi • today at 9:26 AM

[dead]

ishashankmi • today at 9:25 AM

[dead]

hindustanuday • today at 1:00 PM

[dead]

nicos29 • today at 11:14 AM

[dead]

syndacks • today at 2:46 AM

[dupe]

anonnon • today at 2:58 AM

[flagged]

➕ show 3 replies

skydhash • today at 1:23 AM

[flagged]

➕ show 4 replies

justatdotin • today at 2:47 AM

[flagged]

➕ show 1 reply

nasnsjdkd • today at 5:26 AM

[flagged]

castwide • today at 1:23 AM

[flagged]

jama211 • today at 9:02 AM

The difference between the performance of models between 2024 and 2025 has been so stark, that graph really shows it. There are still many people on these forums who seem to think AI’s produce terrible code unless ultra supervised, and I can’t help but suspect some of them tried it a little while ago and just don’t understand how different it is now compared to even quite recently.

alt Hacker News

2025: The Year in LLMs

Comments