"Because LLMs now not only help me program, I'm starting to rethink my relationship to those machines. I increasingly find it harder not to create parasocial bonds with some of the tools I use. I find this odd and discomforting [...] I have tried to train myself for two years, to think of these models as mere token tumblers, but that reductive view does not work for me any longer."
It's wild to read this bit. Of course, if it quacks like a human, it's hard to resist not quacking back. As the article says, being less reckless with the vocabulary ("agents", "general intelligence", etc) could be one way to to mitigate this.
I appreciate the frank admission that the author struggled for two years. Maybe the balance of spending time with machines vs. fellow primates is out of whack. It feels dystopic to see very smart people being insidiously driven to sleep-walk into "parasocial bonds" with large language models!
It reminds me of the movie Her[1], where the guy falls "madly in love with his laptop" (as the lead character's ex-wife expresses in anguish). The film was way ahead of its time.
Secondly, if his creations are going to be relied upon, it will be the programmer's primary task to design his artifacts so understandable, that he can take the responsibility for them, and, regardless of the answer to the question how much of his current activity may ultimately be delegated to machines, we should always remember that neither "understanding" nor "being responsible" can properly be classified as activities: they are more like "states of mind" and are intrinsically incapable of being delegated.
EWD 540 - https://www.cs.utexas.edu/~EWD/transcriptions/EWD05xx/EWD540...
I understand the parasocial bit. I actively dislike the idea of gooning, ERP and AI therapists/companions, but I still notice I'm lonelier and more distant on the days when I'm mostly writing/editing content rather than chatting with my agents to build something. It feels enough like interacting with a human to keep me grounded in a strange way.
tacking on to the "New Kind Of" section:
New Kind of QA: One bottle neck I have (as a founder of a b2b saas) is testing changes. We have unit tests, we review PRs, etc. but those don't account for taste. I need to know if the feature feels right to the end user.
One example: we recently changed something about our onboarding flow. I needed to create a fresh team and go thru the onboarding flow dozens of times. It involves adding third party integrations (e.g. Postgres, a CRM, etc.) and each one can behave a little different. The full process can take 5 to 10 minutes.
I want an agent go thru the flow hundreds of times, trying different things (i.e. trying to break it) before I do it myself. There are some obvious things I catch on the first pass that an agent should easily identify and figure out solutions to.
New Kind of "Note to Self": Many of the voice memos, Loom videos, or notes I make (and later email to myself) are feature ideas. These could be 10x better with agents. If there were a local app recording my screen while I talk thru a problem or feature, agents could be picking up all sorts of context that would improve the final note.
Example: You're recording your screen and say "this drop down menu should have an option to drop the cache". An agent could be listening in, capture a screenshot of the menu, find the frontend files / functions related to caching, and trace to the backend endpoints. That single sentence would become a full spec for how to implement the feature.
In the next year developers need to realize normal people do not care about the tech stack or the tools used, there are far too many written thoughts and opinions and not enough polished deployed projects. From an industry standpoint it’s business as usual, acquihires from products that LLMs apparently couldn’t save.
I respect Armin's opinions on the state-of-the-art in programming a lot. I'm wondering if he finds that "vibe coding" (or vibe engineering) is particularly pleasant and effective in Rust compared to, say, Python.
I spoke to a few people outside of IT and Tech recently. They are senior people running large departments at their companies. To my surprise, they do not think AI agents are going to have any impact in their businesses. The only solid use case they have for AI is a chat interface which, they think, can be very useful as an assistant helping with text and reports.
So, I guss it's just us who are in the techie pit and think that everyone else is also is in the pit and use agents etc.
Armin has some interesting thoughts about the current social climate. There was a point where I even considered sending a cold e-mail and asking him to write more about them. So I’m looking forward to his writing for Dark Thoughts—the separate blog he mentions.
> My biggest unexpected finding: we’re hitting limits of traditional tools for sharing code. The pull request model on GitHub doesn’t carry enough information to review AI generated code properly — I wish I could see the prompts that led to changes. It’s not just GitHub, it’s also git that is lacking.
The limits seem to be not just in the pull request model on GitHub, but also the conventions around how often and what context gets committed to Git by AI. We already have AGENTS.md (or CLAUDE.md, GEMINI.md, .github/copilot-instructions.md) for repository-level context. More frequent commits and commit-level context could aid in reviewing AI generated code properly.
A really interesting point that keeps coming up in discussions about LLMs is “what trade-offs need to be re-evaluated”
> I also believe that observability is up for grabs again. We now have both the need and opportunity to take advantage of it on a whole new level. Most people were not in a position where they could build their own eBPF programs, but LLMs can
One of my big predictions for ‘26 is the industry following through with this line of reasoning. It’s now possible to quickly code up OSS projects of much higher utility and depth.
LLMs are already great at Unix tools; a small api and codebase that does something interesting.
I think we’ll see an explosion of small tools (and Skills wrapping their use) for more sophisticated roles like DevOps, and meta-Skills for how to build your own skill bundles for your internal systems and architecture.
And perhaps more ambitiously, I think services like Datadog will need to change their APIs or risk being disrupted; in the short term nobody is going to be able to move fast enough inside a walled garden to keep up with the velocity the Claude + Unix tools will provide.
UI tooling is nice, but it’s not optimized for agents.
"I have seen some people be quite successful with this."
Wait until those people hit a snafu and have to debug something in prod after they mindlessly handed their brains and critical thinking to a water-wasting behemoth and atrophied their minds.
EDIT: typo, and yes I see the irony :D
Sorry, but why would including the prompt in the pull request make any difference? Explain what you DID in the pull request. If you can't summarize it yourself, it means you didn't review it yourself, so why should I have to do it for you?
Here’s something else that just started to rally work this year with Opus 4.5: interacting with Ghidra. Nearly every binary is now suddenly transparent, in many cases it can navigate binaries better than source code itself.
There’s even a research team that has bee using this approach to generate compilable C++ from binaries and run static analysis on it, to find more vulnerabilities than source analysis without involving dynamic tracing.
The very first thing I did vibe-coding was commit my prompts and AI responses. In Cursor that's extremely easy—just 'export' a chat. I stopped for security concerns but perhaps something like that is the way.
> The pull request model on GitHub doesn’t carry enough information to review AI generated code properly — I wish I could see the prompts that led to changes. It’s not just GitHub, it’s also git that is lacking.
Yes! Who is building this?
Got distracted: love the "WebGL metaballs" header and footer on the site.
It is nice that he speaks about some of the downsides as well.
In many respects 2025 was a lost year for programming. People speak about tools, setups and prompts instead of algorithms, applications and architecture.
People who are not convinced are forced to speak against the new bureaucratic madness in the same way that they are forced to speak against EU ChatControl.
I think 2025 was less productive, certainly for open source, except that enthusiasts now pay the Anthropic tax (to use the term that was previously used for Windows being preinstalled on machines).
The part that resonated most for me is the mismatch between agentic coding and our existing social/technical contracts (git, PRs, reviews). We’re generating more code than ever but losing visibility into how it came to be prompts, failures, local agent reviews. That missing context feels like the real bottleneck now not model quality.
I really feel this bit:
> With agentic coding, part of what makes the models work today is knowing the mistakes. If you steer it back to an earlier state, you want the tool to remember what went wrong. There is, for lack of a better word, value in failures. As humans we might also benefit from knowing the paths that did not lead us anywhere, but for machines this is critical information. You notice this when you are trying to compress the conversation history. Discarding the paths that led you astray means that the model will try the same mistakes again.
I've been trying to find the best ways to record and publish my coding agent sessions so I can link to them in commit messages, because increasingly the work I do IS those agent sessions.
Claude Code defaults to expiring those records after 30 days! Here's how to turn that off: https://simonwillison.net/2025/Oct/22/claude-code-logs/
I share most of my coding agent sessions through copying and pasting my terminal session like this: https://gistpreview.github.io/?9b48fd3f8b99a204ba2180af785c8... - via this tool: https://simonwillison.net/2025/Oct/23/claude-code-for-web-vi...
Recently been building new timeline sharing tools that render the session logs directly - here's my Codex CLI one (showing the transcript from when I built it): https://tools.simonwillison.net/codex-timeline?url=https%3A%...
And my similar tool for Claude Code: https://tools.simonwillison.net/claude-code-timeline?url=htt...
What I really want it first class support for this from the coding agent tools themselves. Give me a "share a link to this session" button!