Most of us were amused when DALL-E and its peers went mainstream, and we were quick to point out the obvious flaws.
Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much.
Using LLMs for coding initially was a only small step up from basic code completion, and a welcome farewell to Stack Overflow.
I am curious: what was the specific moment that you went from those quaint, dismissive observations to a slightly panicked, "Uh Oh" realization of what these models can do?
I bought an Alesis QS8.1 super cheap in perfect condition (was a top grade digital piano/synth in the 90s).
and then i realized that ALL of the software (which i collected from defunct websites and archived on github) related to it was ancient and after a while of getting tired of using WINE every single time i decided i wanted a cross platform modern equivalent that did everything that several of these different programs did (plus break out some stuff that was now potentially possible with modern computer)
i thought it would be extremely hard because the computer to synth communication is pretty much only via sysex commands (of which the actual wave file encoding protocol was undocumented)
Claude walked me through examining the some of the original software in GHIDRA, and I had a working demo that night.....now im just playing with adding new features to it.
Opus 3.x building me a productivity system with Obsidian MCP originally.
Next was discovering "create a mathematical model of the problem and derive the solution as a result" type prompts.
But, the real "oh s**" was a longer process of spec'ing a compiler/runtime for real-time DSP (with a lot of novel ideas) and it actually working.
My sequence was: (1) if helps me understand myself, (2) if helps me put together good ideas, (3) it can generate novel ideas given the right inputs, (4) it can build useful tools on my machine, (5) it can compound good ideas into better and better ideas with repeated passes, (6) it can build significant, ambitious machinery that's way beyond my ordinary capacity.
Current frontier: it can compound large codebases into better and better machinery with repeated passes.
The key thing I track is whether I'm running a process that converges and compounds or whether I'm spinning in place / diverging.
For me it was torrenting a 7G ball of weights leaked from Meta and running alpaca.cpp (an early variant of llama.cpp) on my desktop computer in early 2023. I started asking it questions about the Roman empire and it answered me in English! The responses were generally incorrect, but no worse than what your average American college student might guess at, though delivered with much more confidence.
This was my desktop computer responding to questions in English, not some fancy server in a massive Google data center. Who cares if what it says isn't reliable? Being able to converse with my CPU in English is like having a conversation with a dog!
ChatGPT reconstructing idiomatic Python source code from Python bytecode was definitely up there. That is not something humans have written a great deal about online. It requires simulating the Python VM.
I remember also having a massive wtf reaction to realizing that original ChatGPT was pretty good at decoding long random/unique base64 strings.
My furnace went out during the 2025 holiday and I couldn't get an appointment with a repair person for 2 days. It was getting very cold in my house so I went into my attic and made several videos of the furnace attempting to start and gave it to gemini. It diagnosed the issue immediately and had me spin one of the components (a small exhaust fan) while the furnace tried to fire. It came on immediately. I had to do that several times, but it worked until the HVAC service showed up.
Actually seems absurdly simple now, but sometime last year I was trying to figure out what I'd need to tow my daughter's car cross country with my truck: what are the trailer/dolly options, what do they cost, can my truck actually tow the combined weight, etc.
I started out prompting ChatGPT kinda how I would with Google, one small prompt at a time, asking about various details. But after one or two of those I just tried "I want to tow a car of make A with my truck model B, from point C to point D, what are my options?" And it wrote me a report with comparison tables and computed towing weights and other details for different options.
At that point, I was like "Oh. This is different. And it's just the beginning."
I could go on and on, but Claude recently decompiled the firmware of my camper van, documented all the CAN interfaces, then programmed an ESP32 module to talk to the van’s integrated systems (power, HVAC, lighting, tanks). That sort of embedded systems integration is completely out of my wheelhouse.
I honestly don’t understand AI naysayers. I use Claude every day both professionally as a Solution Architect and personally in a variety of projects I simply could not have ever approached alone.
Working on Unity games with Codex 5.5, it has no problem rummaging through and hand-editing any kind of game asset file. So many things that would be so tedious to fix by hand are so easy now. It's really made programming and game dev fun again.
Literally just last night I have Claude Code the following prompt, verbatim:
"Whenever I launch Kodi on my Chromecast 4k, it crashes. I think this is related to a plugin or skin. It goes away for a bit if I clear cache but will eventually come back. Can you connect to the device via adb (I've run adb connect already), and debug exactly where it's crashing? Once you've done that, propose a solution. If this requires downloading, fixing, rebuilding and then uploading the broken extension via adb, don't be shy. I should have Android dev tools (Gradle etc.) on this Mac."
Lo and behold, without human intervention, it pinpointed the crash, downloaded the Kodi source, patched out a bug that had existed since 2016, recompiled it, signed it, then pushed it to my Chromecast all while carefully making sure to keep all my settings intact.
Got it to make a PR too (which is as of this moment unpublished; going to test more over the coming weeks).
I tried to see if an LLM service provider could rewrite some legal docs where nothing was hallucinated in order to follow a consistent format to see what may be missing in the document. It could do that.
Next, I wanted to see if this could be done with a local LLM. Gemma-4 handles this fine with an 8GB video card and a large context (128k).
Next, I wanted to see if the model could also OCR these docs and translate them. The same model can handle that quite well.
This was when I realized LLMs should be great for handling work where:
- I already know what I want to do
- I already know how to do it
- I don't think this task will help develop skills I find to be valuable
- If I have to do it manually myself, I will probably cut corners
So now I view LLMs through the lens of, "what work can I send to an LLM that I otherwise would not really care about doing."
I have a large token budget as part of my work. A coworker was scanning some repos for vulnerabilities as a test. He found a scary looking remote exploit in a popular project and shared it with me for a second opinion. I spun up a local instance of the project and ran the POC against it: nothing. Turns out it needed some configuration knobs tweaked to lower some security protections.
So I told the AI what happened, and asked it to fix the POC so that it would work with the default configuration. It chewed away at that for a few minutes until it cheerfully patched the POC into a weaponized version. I ran it. The local instance, which I had just downloaded, compiled myself, and launched with the default config file, immediately crashed.
I got the cold sweats. I've read this novel. I've seen this movie. Wow. I have a blinking cursor on the console of a nuclear information bomb. I tossed and turned all night, got about half an hour of actual sleep, and probably looked like I'd seen a ghost at work the next day.
On the plus side, it gave our team some very clear ethical and moral guidance: we're going to do this, and we're going to share our findings with the relevant authors, because we can. Because I want to live in a world where the good guys are trying to fix problems before the bad guys can find them, I decided to help build that world. It was like, well, I guess this is what I'm doing now.
I guess I've had several of those moments over the last year and a half. But a recent one was that I was working with Claude to create a spiking neural net MNIST classifier in an FPGA for a demo. Claude took it from concept to PyTorch, to training (training a Spiking neural net isn't necessarily straightforward - that's a whole post in itself, but Claude came up with a working solution), and then to implementation in Verilog and through synthesis into the FPGA. I asked Claude to create a drawing app to run on the PC side that would allow the user to draw a digit with a mouse and then click a classify button. The data from the digit drawing app was to be transferred via USB to SPI to the FPGA. I didn't have a SPI adapter yet (it was on order from Adafruit) so I asked claude to let me communicate with the simulated verilog code running in the Verilator simulator, through a virtual SPI interface. Then I went to lunch. I came back to see the digit drawing app displayed on the monitor. I drew a '2' and it classified it as a 2. In another window I could see the Verilator simulator running and the data being passed. Chills.
I don't remember one specific moment, but I was fairly impressed with ChatGPT from the first time I started interacting with it. Was I ready to call it "AGI"? No, absolutely not. But it was clear that it was something new, and it was also intuitively obvious to me that "this AI is as bad today as it will ever be" and that predicting the rate of change would be difficult.
The more I use these things, the more I'm 100% convinced that it makes sense to say they are "intelligent" (for some meaning of "intelligent"). AGI or "human level intelligence"? Still no[1]. But some kind of intelligence. And I'm quite happy to allow that there can be "intelligence" that doesn't work anything at all like human intelligence, so arguments of the form "this isn't real intelligence", etc, etc. carry very (very) little weight with me. I've actually been sitting on a half written blog post on this very topic for a while, titled "The Marquee Sign Says 'Artificial' Intelligence"[2]. Finding time to finish it has been the challenge.
And before somebody says "Use AI to write it for you". Nah. I am generally what you might call "pro AI" and / or an "AI enthusiast" but I still draw lines. I'll use AI for research, for outlining, for brainstorming, etc. sure. But I have a hard-line stance against letting AI fundamentally write for me. I want anything that goes out with my name associated with it to have my genuine voice.
[1]: I like the term "jagged intelligence" that Demis Hassabis has been using. That is to say, the bounds of the intelligence are jagged or spiky: very intelligent in certain areas, much less so in others.
[2]: for any old-skool pro-wrestling fans, yes, that is an intentional nod to "Double A" Arn Anderson and his "The marquee sign says 'wrestling'" catchphrase. :-)
At my previous work, I was collating somewhat random unconfirmed animal sightings. I also had a separate database of animal occurrence probabilities (species distribution maps). I'm not a statistician but that sounded like a clear job for Bayes theorem: given a sighting and the overall probability of that sighting in that area (species distribution map), and some other assumptions about the noise of the sighting, what is the probability that the sighting actually included that species?
Claude asked me three questions and then wrote a beautiful Python implementation that queries the map and spits out a table of adjusted probabilities. Felt immensely powerful - I can do this 'on my own' now, I don't need to wait to find the right people or learn the right thing first.
I had a C++ actor model which required an Api like the following (std::function):
child->Async(&ChildActor::Method, child, args);
Refactored it to use small buffer optimisation and std::move_only_function)
child<&ChildActor::Method>(args);
And saw a performance jump since no more malloc in std::function.
It also helped me decipher an animation bug in gtlf importer.
Productivity is x4 or higher.
ChatGPT Code Interpreter back in ~March 2023. I uploaded a CSV file (of police incidents in San Francisco) and watched it load that into Pandas, show me some charts, then export the data to a SQLite database file for me to download.
I write software for data journalists and this new thing appeared to be able to do everything I wanted my software to do just as an unplanned side effect of having the ability to run Python against a folder with some uploaded files in it.
With hindsight it was my first exposure to a coding agent, but we hadn't named the category at that point.
I had an old astronomy app I wrote for pre-iPhone app store era Nokia phones (N900 etc.). I decided to get Claude code recreate it as an Android app. The old app produced several display pages for things like the positions of the planets. I was having Claude code recreate the app display page by display page, describing the display that should be produced, with no reference at all to the original app's code (or even its existence). After having it reproduce several pages, it added another one unprompted. The page it added was in the original app, but I had not gotten around to adding it to the Android app. The Nokia app's code is still on github, and somehow Claude must have made a connection between what I was asking it to code (without ever mentioning the Nokia app) and my github repository's Nokia code. It correctly implemented the page without me even mentioning the missing page. My jaw hit the floor.
I have a buddy who's a consultant. His niche area is Netsuite and Oracle (I think). He's an accountant by training and as a consultant his gig was setting up these instances for clients, charging them an arm and two legs. He'd spend a lot of time golfing, and doing these setups was more than enough money for him. In other words, he had cornered that little slice of the market and was making bank.
Shortly after ChatGPT 2.2(?) came out and hit mainstream, I was chatting with him (I was excited af about the possibilities of AI). He tried to pop by bubble by saying "I bet it can't do what I do for my job!".
So I decided to test it out. We went home and I pulled out my laptop. Went to chatgpt.com and then I asked him to enter the specifications of what Netsuite configuration he wanted. So he proceeded to type in the description of what he wanted, the various settings, configurations, etc. i.e., the specs that he typically gets from his clients. And asked it to give him the commands to set it up.
Lo and behold. ChatGPT came back with a series of commands that he needed to run; the options he needed to configure, etc.
He was crestfallen. "Those are the exact commands I run!"
Luckily for him he recovered. He has since settled on a small stable of clients, all privately held companies whose owners he knows and between them he makes enough to keep his golfing hobby fed.
I've had many, but a recent one was when I figured I'd try asking Claude for help with my attempts at learning to draw, specifically anatomy.
I uploaded one of my sketches and asked for feedback, expecting it to not be too useful, but it actually pointed out many issues that no one had ever pointed out to me, but perfectly explained some of the things that felt off to me. Out of curiosity I then also asked it to label the issues in the sketch. It wrote a python script with the coordinates to put everything at and labeled the sketch that way.
I'm still used to vLLMs not being that great at vision, so it was pretty surprising to get genuinely useful advice.
I helped train some of the first "magic" models at OpenAI[1] and it was a wild ride. We were a pretty sane + skeptical team and we weren't totally convinced the models were as general as they seemed, but the query that convinced me (and later got included in the paper[2]) was "Why is it important to eat socks after meditating?" (something that almost certainly did not appear on the internet before).
An interesting follow up would be when did you realize GenAI wasn't as good as you thought in that "oh shit" moment
[1] co-author of InstructGPT/RLHF/ChatGPT
I was working on a science experiment (electromagnetics) with my 10-year-old kid that was going to be demonstrated at a science fair in his school. We ran into a hiccup with the experiment that we couldn't debug ourselves. I turned on Gemini live video call to help us root cause the problem. It was able to clearly articulate all the possible issues and eventually was successful in making our apparatus work as expected. Turned out the wire that I was wrapping around the screw had some insulation that was not scraped off well on the side it was connecting to the battery. Gemini was able to capture this detail even though my bare eyes could not. My kid and 2 of his friends were impressed not just by the experiment, but because the live audio/video back and forth we had with the AI was almost magical!
For me it was earlier this year when I started dusting off some old stalled projects and had an agent work on them. In a few days I:
* Built a clone of the Alpha Zero implementation[1] my team built at oracle
* Ported my hobby NES emulator from javascript to rust[2] (this actually took less than 30 minutes and worked on the first try)
* Implemented all of the lessons from the C++ Grandmasters Challenge (which eventually led to a complete c++ compiler[3])
The thing that flipped the switch was using it to build things that I actually put sweat-equity in to previously. I knew how hard these things were to build, so it landed in a way that other projects had not.
[1]: https://medium.com/oracledevs/lessons-from-implementing-alph...
[2]: https://github.com/vishvananda/popeye
[3]: https://medium.com/@vishvananda/i-spent-2-billion-tokens-wri...
I am the CTO of a small NGO (10 people total, only 1 other junior Dev at the time). We supported two apps that were built by consultants. They were a mess. NextJS, React, about 4 micro services for a site that had 50 users per WEEK.
I configured a devcontainer with the old codebase and an empty repository and asked Claude to rewrite it as an old school server side rendered Django app.
Went to sleep. When I woke up it was 80% done. Spent another couple days prompting and reviewing and reached feature parity.
A bit later did the same with the other app.
Now both are deployed, reduced the server costs, complexity, and are orders of magnitude faster.
Without AI agents we wouldn't be able to do so (as usually is the case with tech debt).
AI is amazing for small organisations!
The big one was definitely ChatGPT upon release in 2022 and specifically when people showed how it can role play as a Linux terminal and you can narrate events like "the data enter is now on fire" and "run" nvidia-smi, it would show high temps on the gpus etc. Or you could "explore" the homedir or some famous person. It convinced me that if it can understand so well how terminals work, tool use and agents are around the corner.
Then Opus 4.5 convinced me that this has finally arrived. In 2022 I expected things to arrive faster actually, in 2023-2024. I expected we'd have much more realtime collaborative integrations with AI including GUI computer use. Maybe in 1-2 years.
For images, it was nano banana where I realized AI images can truly work, and all these adhoc issues like hands and limbs, or "it will never do horse riding a astronaut" were temporary. It's now clear that making feature length films is within reach. Not in one go but with an agent orchestrating, designing a screenplay, characters, shots etc and generating those. Whether the result will be worth watching or a flat story on the high level is another question. But it will be a "film" for sure.
I probably will be burned for this, but with the help of an LLM I wrote a tiny program that captures video from a browser screen (Xbox live online FPS game), passes the video images through a small trained NN that recognizes people forms and presents the video on another screen. That way I can place a green overlay on enemies and they are easier to see on PVP matches.
All that in around 100 lines of code, including the training/fine-tuning of the tiny YOLO nn.
Someone in the house pressed the button to update the printer (Brother DCP-L3550CDW) firmware and the CSV page that was the basis for an existing Prometheus exporter (drum/toner lifespan, page counts, etc) stopped being a thing. Instead there was an HTML page with all of the information buried in various divs/etc.
I'd planned on writing something myself to parse the HTML and write a suitable exporter but I thought I'd give Claude a chance.
In a sandboxed VM I gave Claude a single static HTML file of the status page from the printer, also in the directory was the equivalent of "hello world" in Go, literally just the minimum needed to do `fmt.Printf("OK\n")`. The directory was called `brother-exporter`. That was it. No other instructions or information. I hadn't told it what it needed to write. I hadn't said what it should do. I hand't told it what language it was supposed to use.
Just by doing a `/init` in that directory Claude decided that it needed to write a Prometheus exporter in Go that would fetch and parse the HTML file from a printer (defaulting to 192.168.1.1) and then present the associated metrics in a way that they could be scraped by Prometheus.
It did this flawlessly in about 10 minutes.
I could have done it in several hours but this was definitely an "oh shit" moment for me. I think the biggest thing was the fact that it guess/assumed so much (correctly) from so little information in the beginning.
Not coding, but reading logs.
I was trying to figure out a nightmare bug that only happened in production and Claude code was able to connect to Google Cloud and read the logs in real time
I recreated the bug in the UI and it was instantly able to see ion the logs what the problem was, then because it had the context of my whole codebase it was able to point me to the exact line of code causing the problem
That was certainly an "oh shit" moment
(1) Watching it do log file analysis in seconds that would have taken me hours (edit: days, in fact), and which I would therefore never have done in the first place.
(2) Helping me with optimizations that I had been putting off for years because they involved learning curves that I never had time to take on.
(3) Tracking down bugs in code, especially race conditions and other concurrency issues, that were otherwise baffling.
(4) Finding information that I had been unable to find using Google searches (e.g. https://news.ycombinator.com/item?id=42653136).
There have been others, but those are what come to mind - perhaps because, in each of these cases, it made something happen that would otherwise never have happened - not because it was impossible, but because the level of effort required was prohibitive.
I got early access to the pre-ChatGPT OpenAI API (actually by pinging someone from OpenAI who posted about it on HN). At work, we were setting up to play a livestreamed JackBox game for a charity event. This would have been in 2019.
In a previous life, I'd been a writer for the original You Don't Know Jack game (the UK variant), where the job was to crank out as many funny quips about a topic as you could, and then use a handful of them in the recording of the game itself. Some of the later JackBox games are like that, but for the players -- you're given a set piece, have to come up with little funny improvisations within a time limit.
As an experiment, I tried the set-up lines with the OpenAI API, and see whether it could come up with some responses. Of course, 90% of them were unfunny or incoherent, but 1/10 were not bad, or even pretty good.
I'm not sure that would have been impressive to anyone else -- but remember, I'd had this as a job, and sat in a writer's room, where everyone did this, for hours. In that environment, you expect a large proportion to be duds: the discipline is keep pumping them out, and not flagging creatively until you find a rich vein. I realised that this was a tool that would have been the perfect complement to that work -- and it was a pretty good JackBox player too.
Two of them:
1. ChatGPT 3.5 wrote me a script to pull some data out of Shopify and write it to a Google Sheet. Nothing remotely impressive by today's standards, but I had just commanded a computer to write code in plain English and it worked!
2. I own a bunch of e-comm brands, and with every new image model I tried to get product photography. Nothing worked until Nano Banana Pro, when suddenly I gave it a crappy iPhone pic of a product and got back a fully usable whitebox photo of it. Then I tried making the sort of infographic-style images you usually see on Amazon, and it nailed those too! In hindsight they weren't perfect, but more than good enough to use. I was about to ship that product to my photographer, and I would've had my designer make the infographic images, so that was the first time AI actually replaced a human contractor for me. Pretty big "Oh shit this is going to seriously impact employment" moment. Wrote about it here: https://theautomatedoperator.substack.com/p/ai-just-took-my-...
Look, not to brag but DALL-E's "armchair in the shape of an avocado" was mine (https://openai.com/index/dall-e/). I remember trying to convey the gravity of this capability to my friends at the time, who I guess were not as impressed as me.
None so far. When I try to use these language models in the primary areas of my expertise like SIMD or GPGPU they fail to do any good. When I ask them to implement some general-purpose stuff, the output is too low quality to be useful in my software.
Still, find them incredibly useful for code review (despite unable to write good C++ or C#, smart enough to detect issues there), also dealing with technologies outside of my area of expertise like Python or web stuff.
We were experiencing abnormally high electrical bills and I could not figure out what was happening, so I downloaded the granular usage data (15 min increments) from Duke Energy, explained what we had in our house and when we typically used those items (washer/dryer, EVs, etc), provided a rundown of our energy usage plan, then asked Claude to build me a Streamlit dashboard that would help us understand what was going on and predict what was going to happen over the next months. The dashboard had a few simple toggles a levers. Claude was basically able to one-shot this, knew how to manage the XML from Duke Energy, etc... In about 20 minutes of prompting, I had a very comprehensive dashboard that was extremely helpful not only in diagnosing that specific issue but also in helping us understand how to further lower our electrical bills.
I didn't have a slightly panicked moment, but sometime in the last year my approach to programming changed.
When starting a project, I used to think about how I was going to structure it, how the large pieces would interact, how some of the details would work out, and then I'd work through alternatives and consequences on my own.
Now I don't think about it on my own so much as have a conversation with an LLM about it. And it's great because it can quickly gather information from various sources, I can ask it for links to canonical sources, I can ask it about trade-offs between alternatives that I might not have considered, and through conversation, I end up with a more detailed analysis.
Then as I work through the development, I keep my new agent partner in the loop for discussion, suggestions, and troubleshooting. It can't be trusted completely, but it's certainly reliable enough to be considered a useful tool for my purposes.
I went from thinking it was an interesting toy to play around with, to completely integrating it into my work flow, and that change seems to have happened very quickly.
Working on a Spice compiler to convert schematics for classic guitar pedals into real-time executable code.
I provided a reference to a The Spice Manual 2nd ed. a page number and an equation number, and asked Claude to implement it (not really expecting it to succeed).
It proceeded to implement not only the equation, but the calculation of the Langrangian of the functio, another 30 lines below, which required taking symbolic partial derivatives for a not-at-all trivial function, and successfully figuring out which variable was which in the resulting matrix. The source material just said "Lagrangian of", and did not provide the partial differential equations. And then providing a comment that identified the page number and equation number in the source text for the "Lagrangian of" equation.
From actual use I've not had a "oh shit" panicked moment yet. More like a bunch of "Holy shit" euphoric moments.
So far I feel like I as a developer have gained actual superpowers, and can deliver results that make my stakeholders slackjawed with awe. I love it.
It will last perhaps a few months more, then they'll expect it. Delivering more features faster will be the new normal. But I think system developers, as in people who actually like to deliver new features and systems, will still be the ones doing it.
Fundamentally I think LLM's just change how to make information systems, they don't change who has the inclination to make them.
MBA's making excel sheets that do more than excel was ever intended to do has given programmers lots of work over the years. Such solutions identify a need for a properly designed system and frees up the budget to hire programmers.
If the same MBAs start vibe coding, I predict we will get even more to do, for similar reasons.
I may be horribly wrong, and if the day comes that I realize that it will be the "oh shit" panicked moment. So far so good!
I could spot numerous bugs in code written recently and less recently, by me or colleagues. I was not angry but grateful and I knew there was no way back!
I had an old 1st gen Amazon Firestick in a drawer for years, it had updated to the latest software and there were no public root exploits.
I spent a day bouncing between Claude and Codex and they researched, downloaded kernel sources, tried exploits and eventually got root via "FBUF/VCHIQ kernel zero-write primitive to patch live kernel memory". I was able to make the root permanent, debloat the amazon apps, downgrade the firmware etc.
It was amazing to watch and made me excited for the future where more hardware (old and new) will be available for repurposing.
Had an AI plot movie rotten tomato reviews versus cost for 2 adult tickets, plus candy and a large popcorn prices from the specific theater, and the round trip gas from my cross street, including only movies which would get out in time that I can be home by 10pm, including preview times.
None of that is mind blowing, but that Google or some other site has never offered me this type of analytics, is where I'm floored. It's a trivial query, but perfectly useful for planning a night out with my wife.
So many. First was when I saw GPT-2 create jokes that were original and kinda funny.
Most recent: I use Claude Code and have a convention where I grant various levels of autonomy during a session. I got bored recently and just let it keep running with an empty issues queue, essentially telling it to do whatever it wanted.
It did a bunch of repo cleanup, then it kept suggesting to end the session, but I just kept giving it autonomy prompts.
It started a creative writing public repo and wrote a bunch of stories, essays, and poems. I did not prompt it, at all, to do that. Some of what it wrote is quite good (IMHO).
Running local LLM in 2023 and I heard folks talking about interfacing LLM to tools. I wrote a system prompt and told LLM it can call some tools. If it wants to call a function to output func(params...) and do so in an XML tag. I provided a few examples, none of this JSON soup we get today. Then told it I'll provide it the result in a RESULT XML tag and it should use that to answer. Wrote up a harness around that and I had a local model interacting with the outside world. Oh wow! Everything else today about MCP, Agents is all an extension of that thought. Using function calling, I built an agent. I defined a data structure that represent rooms and how they are connected. The room will be marked as dirty or clean. Then I would place the agent in a room and the agent will decide if to go left, right, down or up and into a room. Once it got into a room, it would decide if to clean it or go to the next room. Repeat until all rooms are clean. Basic toy of CS101 AI vacuum agent. It worked!
So being able to get real world input/output to the model and having the model being able to make decisions in a loop and to be able to do it locally. I have been screaming like a mad man ever since.
Been using it to manage an estate and just being able to shove all the documents right into an LLM and have it spit back out perfectly worded emails as well as keep track of check lists of things I need to do with an automatically create a ledger for me in sheets. It's been a huge mental load off and I've instead been able to focus better at work and the labor costs saved to me have been immense. Just on this one little thing. I'm one of those people that over thinks correspondences and letters and it ends up causing me to be stuck on something so being able to ask for just the right wording has been super helpful to me.
It was the release of Stable Diffusion and its source code.
I spent the next few days tinkering with my own Stable Diffusion implementation. I never got it past outputting total nightmare fuel, but it was fun!
To this day I think of the process as like baking pizzas in a sequence of pizza ovens
Some business users spent ~30 minutes on an internal process, and we prototyped an "Agent" in Slack to take over. At first it didn't work, then it didn't work some more, eventually it ALMOST worked. Then one day, it worked, and the old business process died never to be revived.
Now it sits in a slack channel, and I watch it doing work, responding to ambiguity, and taking feedback/edits all day. It's unreal. It's literal magic. It saves a HUGE amount of time and gave us a pattern to do more.
This is the real deal. It's not easy to find problems with the right shape, and it's not easy to build agents that fit even when you do... but once it clicks, it clicks.
AlphaGo. Reinforcement learning on math with proof assistants was clearly going to be workable after that, even if not right away.
For me it was Suno, not any of the coding tools. I prompted it to write a song about my family's little dog, told it a few things about the dog, and it came back with a K-pop-style anthem that had a super catchy melody and lyrics that made my wife and me laugh out loud.
Writing code to spec is one thing, but creating art was always supposed to be what separated us from machines. (I suppose I need to preemptively acknowledge the "it was machine-generated so by definition cannot be art" point of view.)
My most recent one: Taking a bricked ipad and plugging it into my linux laptop, then telling deepseek to fix it. A couple of hours and twenty sudo passwords later it was working again.
To share something different, it is less about what I have built, and more about what I have seen my friends (non-technical and technical) build. In a one month span I have seen a lawyer make a personal red line tool, a sales guy make a custom website for a golf trip, another friend make a 3d printing grid-finity project, a friend make a stl file to print a jig for his table saw, and another friend make a full mobile game. It is just really cool to see these micro-projects be created and shared, not only for the utility, but just to see my friends' childlike excitement showing off their project.
We had a monthlong sprint adding robot motion planning features to our codebase years ago, and I was never satisfied with the result. As a small team wanting to leverage oss we vendored in OMPL, did the usual thing around caching and roadmap management. I knew there was a way to parallelize some of the algorithm we were using with simd or a gpu kernel, plenty of that in the literature, but it was never worth fighting CUDA or metal/accelerate or whatever for uncertain gains.
So when cooking dinner one night, I set opus 4.6 on a from-scratch native and accelerated roadmap planner implementation (after previously porting IK, FK, collision checking with some success) I had primed it by having a research agent drop a literature review in its docs folder covering the type of planner we needed. By the time the pasta water was boiling it was done- getting plans in a few hundred ms compared to several of seconds on our good old fashioned OMPL code.
For me it was the revelation that the economic value of cooking dinner could be compared to tackling an honest two weeks of coding work. The calculus has shifted - work that was once a risky or extravagant use of time is now worth considering.
For a small team who wants to focus on substance rather than implementation, knows what they want, and how to set up the agent for success, it’s a complete game changer in terms of what we can take on. Incumbents beware