Ask HN: What was your "oh shit" moment with GenAI?

252 points • by andrehacker • last Thursday at 11:42 PM • 503 comments • view on HN

Most of us were amused when DALL-E and its peers went mainstream, and we were quick to point out the obvious flaws.

Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much.

Using LLMs for coding initially was a only small step up from basic code completion, and a welcome farewell to Stack Overflow.

I am curious: what was the specific moment that you went from those quaint, dismissive observations to a slightly panicked, "Uh Oh" realization of what these models can do?

Comments

hparadiz • today at 12:30 AM

Been using it to manage an estate and just being able to shove all the documents right into an LLM and have it spit back out perfectly worded emails as well as keep track of check lists of things I need to do with an automatically create a ledger for me in sheets. It's been a huge mental load off and I've instead been able to focus better at work and the labor costs saved to me have been immense. Just on this one little thing. I'm one of those people that over thinks correspondences and letters and it ends up causing me to be stuck on something so being able to ask for just the right wording has been super helpful to me.

irthomasthomas • yesterday at 8:30 PM

My most recent one: Taking a bricked ipad and plugging it into my linux laptop, then telling deepseek to fix it. A couple of hours and twenty sudo passwords later it was working again.

➕ show 1 reply

segmondy • today at 1:23 AM

Running local LLM in 2023 and I heard folks talking about interfacing LLM to tools. I wrote a system prompt and told LLM it can call some tools. If it wants to call a function to output func(params...) and do so in an XML tag. I provided a few examples, none of this JSON soup we get today. Then told it I'll provide it the result in a RESULT XML tag and it should use that to answer. Wrote up a harness around that and I had a local model interacting with the outside world. Oh wow! Everything else today about MCP, Agents is all an extension of that thought. Using function calling, I built an agent. I defined a data structure that represent rooms and how they are connected. The room will be marked as dirty or clean. Then I would place the agent in a room and the agent will decide if to go left, right, down or up and into a room. Once it got into a room, it would decide if to clean it or go to the next room. Repeat until all rooms are clean. Basic toy of CS101 AI vacuum agent. It worked!

So being able to get real world input/output to the model and having the model being able to make decisions in a loop and to be able to do it locally. I have been screaming like a mad man ever since.

acrinimiril • yesterday at 10:04 PM

Two things:

1) I wanted a harness for running BPC.EXE (the old Borland Pascal 7.0 Compiler) and I asked Gemini 3.5 to build it for me using the unicorn engine. It whipped out a working .py file easily under ten minutes. Most likely five.

2) I handed a random assembly function from the OS/2 1.x kernel to Gemini 3.5, and it proceeded to tell me that it was related to disk I/O and partitioning, without a single associated string, and it annotated it all, including the relevant structures it was addressing.

eqmvii • yesterday at 11:16 PM

Some business users spent ~30 minutes on an internal process, and we prototyped an "Agent" in Slack to take over. At first it didn't work, then it didn't work some more, eventually it ALMOST worked. Then one day, it worked, and the old business process died never to be revived.

Now it sits in a slack channel, and I watch it doing work, responding to ambiguity, and taking feedback/edits all day. It's unreal. It's literal magic. It saves a HUGE amount of time and gave us a pattern to do more.

This is the real deal. It's not easy to find problems with the right shape, and it's not easy to build agents that fit even when you do... but once it clicks, it clicks.

hypendev • yesterday at 8:06 PM

Back in the times of GPT3 text completion, right before the API came out, a contemporary art museum asked me to collaborate on a project. The project was supposed to include a chatbot, and I was like okay I can probably hook something up.

Then I remembered the "text completion LLM thingy" I saw on HN, and tried it out in the playground. Once I gave it an IRC style example of a conversation to complete, I was like hm, this could work. Then I figured out I could "sort" people into different groups based on personality using the same text completion engine and some answers they provided. Then I noticed I could have it provide me with JSON directly.

That's when I realized how big this could be for code and data analysis - even tried to convince an at the time cofounder to pivot into AI coding, but to no avail.

Once the API was released and the art project chatbot got launched (and the theater show associated with it, which even won some awards), people who used it loved the chatbot, got into heated arguments with it, tried to teach it things, talked about their lives and were sad when it didnt remember something.

That was when I understood the social impact this could have on people - they really behave like its a person on the other side. They show interest, think it displays emotion, try to entertain it, be polite, ask about its thoughts and hopes and dreams. And even when they knew they were talking to a machine, they were still trying to be friends and make it happy, which was quite beautiful to see.

Later on, I had a third oh shit moment - once the 3.5 API was out and about, I prototyped a Rust code generation harness for a client, akin to a primitive claude code. That was the "I'm getting a bit worried" oh shit moment, and it caused a lot of reflection and thinking about the future. And I happily welcome it.

➕ show 1 reply

abecedarius • yesterday at 11:57 PM

AlphaGo. Reinforcement learning on math with proof assistants was clearly going to be workable after that, even if not right away.

jerome-jh • yesterday at 8:47 PM

Recently, Claude (through Copilot) found a hardware issue on our product. I was asking it to find an issue in a specific feature of a device driver, that could cause what we observed. It determined the feature was correctly implemented.

Then it hinted that depending how the hardware is implemented, it could cause the observation. It turned out the hardware was implemented as suspected by Claude.

I was already convinced it knew the codebase, somehow, more than I do. Now it is just as if its knows the product and its use as well.

orzig • yesterday at 7:57 PM

"Write a bible verse ... explaining how to remove a sandwich from a VCR" https://x.com/tqbf/status/1598513757805858820

paulbjensen • yesterday at 9:06 PM

I would say the first time I did “vibe coding”, when I tried Claude Code with Zed’s agent integration in January this year.

I wanted to see if I could build an image editor for isometric graphics using HTML5 canvas, Svelte, Vite, and the. Rather than do all of the skeleton code setup, I figured “why not try and see if Claude can build the app scaffolding?”.

I gave it a prompt and watched it produce the scaffold, along with a few features I outlined in the prompt.

When I booted the app and saw that the features worked and that there had been an element of design to the layout, that was my mind-blown moment. In a period of about 45 minutes, I added some features and had a basic MVP at the end. I walked back home stunned.

That app is available for free at https://babspixel.com

thenoblesunfish • today at 4:29 AM

When a junior engineer first sent me something that looked good until I realized it had been vibed, and thus their understanding of what they were doing was too shallow to answer questions and improve on it. That was a doc, but it happens with everything. "Oh shit", I say, as everyone is aggressively encouraged to work this way.

xeckr • today at 4:36 AM

Literally the first time I used ChatGPT, within days of release. It wasn't so much panic as amazement.

It took HN a surprisingly long time to come to terms with the fact that professional SWE as we knew it was coming to an end.

In 2023/2024 we saw a demo of "denial" being a stage of grief live on this site.

tracerbulletx • today at 4:42 AM

A lot of things going back to just whisper, and solving translation, but watching frontier models use the browser with playwright to iterate on a complex application with basically no guidance and talk to its self about it feels pretty surreal even still.

mschaef • yesterday at 8:34 PM

This is a small one, but significant to me.

I asked Claude to add support for multiple lights to my toy ray-tracer. It correctly added the support and then suggested adding colored lights to make it easier to diagnose. It felt more like a colleague making a useful suggestion than any sort of pure engineering tool.

hansvm • yesterday at 7:41 PM

A coworker had me work through a particular problem (some no-importance web demo) with Cursor and Sonnet 4.6. It still sucked, but there was a qualitative shift in suckiness, one that I realized could finally be used to solve some real problems I had if I wrote an appropriate harness and used good enough models.

I still find it mandatory to write a lot of kinds of code by hand, but I write a lot of code with agents too now, and I previously literally didn't think that'd happen in <5yrs.

dnnddidiej • yesterday at 11:50 PM

1. ChatGPT first public release (I am not one who saw early GPT models) I think late 2023 iirc?

Why? Turing test bye bye.

2. Opus 4.6 w. Claude Code - not the model in partucular but happened to be when I started seriously trying to vibe code at home, as I saw all the hype on Linkedin. Yes linkedin sucks but it is somewhat a barometer. Around early this year.

Why? Knocking up decent enough web apps so quickly.

Fomite • yesterday at 7:42 PM

When we had to have a frank discussion about whether to fail someone who obviously used an LLM for parts their dissertation.

➕ show 1 reply

matheusmoreira • yesterday at 9:11 PM

Pretty much immediately after I asked the LLM to perform a complete code review of my projects. I've been programming alone for years, that alone was life changing for me. It only got more impressive from there.

➕ show 1 reply

lordnacho • yesterday at 10:56 PM

For me it was gradual, then sudden.

I liked using the early models to do autocompletion. It could do a leetcode style thing, pretty nice, but only useful for small things.

Then I sought out Cursor because that seemed to be able to do multi-document edits. Not bad, but models at the time (2024) still got stuck pretty often. So, cross-document autocomplete. Useful, but definitely within the realm of "nice shortcuts to have".

Then a friend (who works in AI) told me to try Claude last year. I was on holiday at the time, but I spun up my work repo and looked at the backlog.

It chewed through the entire 6-9 months of estimated work in a two-week period while I was watching that Lord of the Rings series with a friend (we watched an episode or two in the evenings). I just chatted with him about the series while checking the progress every few minutes. It was a huge amount of refactoring, and it didn't get everything right the first time, but it made enough progress that it could be directed the right way.

Since then I have hardly coded any manual lines. I just tell Claude what to do, with very little harness (skills, MCPs, instruction files), and I get what I want.

solomonb • yesterday at 8:39 PM

I gave chatgpt 3.5 the type signature for a co-algebraic encoding of a mealy machine:

    newtype Mealy s i o = Mealy { runMealy :: (s, i) -> (s, o) }

And it gave a really impressive analysis.

Then I scrambled all the names and asked with a fresh context like:

    newtype Foo z e g = Bar { blob :: (z, e) -> (z, g) }

It got completely confused and generated a bunch of non-sense. It was at that moment I realized that LLMs don't really understand anything.

And yes I understand that a newer model would not get confused by this.

➕ show 1 reply

virtualram • today at 4:27 AM

I have used AI to crank out new features. Pretty impressive in itself but what recently blew my mind is we have a legacy application where the code is spaghetti and it's difficult to fully understand it. We had a production defect which was hard to triage. I pointed copilot to the legacy source code which was in C++ and also gave it all the log files that were generated. It was able to identify the issue and propose a solution without me even walking through what the legacy app does.

Initially I was trying to do it piece by piece but it was not going anywhere and then when I just gave it the entire source code with the log files it was able to find the issue.

virtualbluesky • today at 4:44 AM

Why is it that nobody discusses uploading all the company's IP to service providers that built their service by 'creatively interpreting' IP ownership?

jmpman • today at 3:34 AM

Had an AI plot movie rotten tomato reviews versus cost for 2 adult tickets, plus candy and a large popcorn prices from the specific theater, and the round trip gas from my cross street, including only movies which would get out in time that I can be home by 10pm, including preview times.

None of that is mind blowing, but that Google or some other site has never offered me this type of analytics, is where I'm floored. It's a trivial query, but perfectly useful for planning a night out with my wife.

smallstepforman • today at 4:10 AM

I had a C++ actor model which required an Api like the following (std::function):

child->Async(&ChildActor::Method, child, args);

Refactored it to use small buffer optimisation and std::move_only_function)

child<&ChildActor::Method>(args);

And saw a performance jump since no more malloc in std::function.

It also helped me decipher an animation bug in gtlf importer.

Productivity is x4 or higher.

dachris • today at 5:24 AM

For me that was already with the original DALL-e. It was utterly mindblowing, I was like "oh shit, AI is here".

"Draw a picture of a unicorn on the moon". And it did that. The model really "understood" what you told it.

After that, it was "oh, AI improved, again".

The farewell to Stack Overflow is not welcome. So many kind people shared their knowledge there. I answered a few questions as well, so not just a lurker.

It's a prelude of what's has already begun - the collapse of human-to-human communication.

hatthew • yesterday at 11:50 PM

I'm kinda of surprised that so many here on HN were dismissive/unaware of the capabilities and potential in the DALL-E days and earlier. I feel like this is the sort of forum where most people would be both aware of advancements and aware of their potential.

My moment was GANs and GPT-2 back in 2019. I feel like that's where computer-generated media went from "obviously fake" to "sometimes can be mistaken as real." RLHF for LLMs and diffusion for image generation are both important improvements, but I feel like they aren't fundamental prerequisites for they type of stuff we have today. I think the main advancements since then are just marginal improvements, larger models/datasets, and better surrounding tooling.

jimmaswell • today at 4:17 AM

Working on Unity games with Codex 5.5, it has no problem rummaging through and hand-editing any kind of game asset file. So many things that would be so tedious to fix by hand are so easy now. It's really made programming and game dev fun again.

gagabity • yesterday at 10:59 PM

Fixed a nasty bug in one of my tests where a mock in a completely different test I had never worked on was incorrectly setup and intercepting my mocks, I don't think I would have found it ever because the amount of effort it would have taken means I would have needed to move on to some other way to test.

Reverse engineered an old audio recorder USB driver which only works in windows 7 and also reverse engineered the custom audio encoding the device uses and the software to convert it to a standard wav file. This took recording the USB traffic with Wireshark for each function in the original software in a VM then disassembling the various dlls and exes and driver files and feeding them into Clause step by step.

That AI button in DataDog not only diagnosed the problem across micro services but also created a fix PR. I think we might be unemployed soon.

sothatsit • yesterday at 11:59 PM

I gave GPT-4 some source code and my existing tests, and asked it to write a new test, and it did it! It didn’t even run straight away, I had to fix it, but it still blew my mind.

Later, I wrote a ~5k line proxy for work in C, and gave the whole thing to ChatGPT o1 and asked it to review it. It found several real memory bugs, and now that service has been running since with no problems.

Just this week, I was trying to write a greedy solver to pick the best subset of block sizes to keep from a larger sweep for shorter testing. Opus 4.8 suggested that this could actually be solved as a MILP problem, and found the perfect solution in 5 mins. I’d never even heard of MILP before.

cdavid • today at 1:03 AM

I wanted to understand the implementation of some numerical algorithms, and the tech reports were not enough.

I cloned the repo of said library, gave it claude and asked it to write a new technical report in math notation, but with annotation with link to the code so that I can pick up the details. It basically one shotted the full report and that helped me re-implement it in "pure python + numpy", "manually".

threwrfaway • today at 2:27 AM

When I used google to get the ieee-488 commands of an arbitrary wave generator from the 80s whose manual doesn't exist on the internet.

This is a very long tail search, but by the end of the day I had enough to fully utilize a very sophisticated equipment.

vesche • today at 5:00 AM

Three moments stick out to me.

1) When I used ChatGPT for the very first time. I still remember, I asked it: “Write an advertisement to convince people to visit the North Pole.” It rapidly returned a witty, accurate, multi-paragraph text of exactly what I wanted and exceed my expectations. ChatGPT was the beginning of the modern AI boom and I remember being immediately impressed.

2) When I was working at GitHub, the copilot team gave the engineering team early access to copilot in VS Code. I can distinctly remember seeing the chat window in the code editor for the first time. I was probably one of the first people ever to see it. I remember playing with it a bit and asking simple Python questions. I knew that day that StackOverflow was dead and my mind was blown.

3) Big oh shit moment earlier this year that I believe for me started with the Opus 4.6 model + Cursor. The results were noticeably better, hallucinated much less, could solve complex problems with much less intervention. Early 2026 was a turning point for me as an engineer with AI. Throughout 2025, I was still writing the vast majority of my code by hand like I’ve always done- that is not that case in 2026.

csr86 • yesterday at 9:29 PM

I was working on a project for 2 years with about 5 engineers. It was many years before AI. It was new subject for our team, and we were pretty sure it was possible. Turned out it was not.

Much later I asked AI if that kind of project is possible, and it immediately explained why it is not. Would have saved 2 years of our time...

balls187 • today at 4:37 AM

Early on with ChatGPT I had it write a script for an Avengers movie, but all the Avengers have below average intelligence.

dtgriscom • yesterday at 9:12 PM

A friend had the power supply die on his high-end turntable. He took a picture of each side of the supply's PCB, handed it to Claude, and it gave him back a schematic.

➕ show 1 reply

dirkc • yesterday at 9:35 PM

I started to look at LLMs not as writing code, but rather as predicting what code it would expect someone to write given the context.

For some people that matches their expectation or they don't really have an expectation. While for other people it doesn't match their expectation.

synthc • yesterday at 10:17 PM

I gave it a weird and convoluted code snippet, and asked an LLM to step through the execution and trace the value of the variables at each step.

It was completely correct and I realized LLM are capable of generalizing beyond their training sets

block_dagger • yesterday at 8:38 PM

I wanted to add gapless playback to an audio archive website I maintain. I tried myself before any of the popular LLMs were available. I failed. I then tried with the first LLMs that came out. They failed. Then, when the first Claude Opus was released, it succeeded. I now have gapless playback.

tverbeure • today at 12:20 AM

The fact that it completely autonomously read in a 5 MB firmware image of an old piece of test equipment and generated a Python script to generate license keys:

https://tomverbeure.github.io/2026/04/12/AMIQ-License-Key-Ge...

➕ show 1 reply

thallavajhula • today at 12:15 AM

I wasn't impressed by the LLMs up until January or so when Claude Code swooped in. Until then, I felt like the LLMs were slowing me down. I have been using them for a couple of years now for coding at work, but I never really thought they brought in real value. Then in February I worked on a 1-month-ish project timeline and shrunk it to 3 days and that was it. I didn't write a single line of code in that project and I went all in with Claude Code. That was it, _the moment_ of realization. I was thoroughly impressed. I went from nothing to a tool that served several teams. Now I'm starting to see the cracks in LLMs and I'm slowly getting back to picking which task to offload to AI and which ones to do by myself.

Claude is great at coding. That's it. Outside of it, it's just god awful at pretty much everything else. ChatGPT OTOH, is good at coding, but at everything else, I find it brilliant. Gemini never made me want to stick with it. It's good, but never great for my use cases.

0xbadcafebee • today at 1:58 AM

When ChatGPT allowed me to calculate stress and load bearing tolerances for a camper based on different materials, suggesting better designs, with the math and sources to back it all up. Then it helped plan and fill out paperwork for a residential solar project, including full code-compliant electrical work, again with sources to verify. Then there was an open source app that wouldn't run on an old version of MacOS due to them not supporting older OSes, and a coding agent backported support for the old OS and got it up and running.

bag_boy • yesterday at 7:43 PM

I had ChatGPT write up a Zillow description for my house in the style of Carrie Bradshaw from “Sex and the City” to impress my wife.

It was unlike anything I had ever experienced.

My wife was unimpressed lol.

This was 2022.

bluejay2387 • yesterday at 7:53 PM

I had a locally hosted model write its own semantic search system that indexed 250,000 documentation and code files and then write a fully functioning mod for one of the games I play based on that documentation that I couldn't get to work after 2 weeks of my own effort, all in under 4 hours (and that included a 25 minute long indexing process). This freaked me out enough that I then had it write a CLI based activity and TODO tracker and then integrate that tool into its coding process to track all of its activities in about another 2 hours. I am still emotionally recovering from this day. I have since replaced the semantic search system with an open source option (though I used it for a few months) but I still use the activity tracker for both coding projects and myself.

➕ show 1 reply

annoyingcyclist • today at 5:17 AM

If you're senior or have opinions about things, you know the feeling of falling into a rabbit hole of stuff you want to fix when you look at certain parts of your system. "I was going to rewrite this 3 months ago", "oh wait this part sucks too", "wtf is this class even for", etc.

Before coding agents, I'd have to weigh fixing these against my official work commitments, often getting shot down when I tried to get it prioritized or tsk tsked for delaying official projects to make code nicer. Now, to a much greater extent, I can just fix the things. The agents aren't perfect and the process isn't anything like hands off, but it's enough of a speedup that I can fit it in alongside my other work without having to get approval for it or try (and fail) to get it formally prioritized.

Not quite an oh shit moment, but having the end result of those rabbit holes be that the problems are fixed is pretty cool, and far preferable to what was often the case before ("we'll put in a ticket and prioritize it during the quality sprint!").

edit to add another:

I've personally never been a big fan of preplanning architecture at a code level. It makes a lot of sense at the system and data modeling levels, but code is both easy to get wrong if you're whiteboarding it before you write it and relatively easy (compared to system design and data modeling) to fix when that happens. If it's just me on a project, I'll happily start bashing it out with a vague idea in mind and evolve the design as I go, knowing that I'll probably throw a way a bunch of what I write at first. I know I do good work that way, and I'm not wasting a bunch of up front time on a design I'm likely to throw out later. It's hard to work that way on a team, especially as a lead, for obvious reasons. Coding agents fit really well for that work style. They'll cheerfully write dueling prototypes of my code architecture ideas so I can see which one I hate and which one I like without talking about hypotheticals and abstractions on a whiteboard. They never get mad at me for changing my mind, wasting their time, or throwing away their work. That's pretty cool. I can have a quick, cheap answer to "what would this look like if I got rid of class X and split its responsibilities between Y and Z?", and I don't have to feel guilty for wasting my time or my teammates time if the answer is "oh man that sucks, what a terrible idea."

KaiserPro • yesterday at 8:24 PM

I've had a few.

The biggest technical one was when we were making an all day wearable AI assistant thing. It basically had really precise office location (think cm level accurate) a shitty VLM to describe what the wide angle lens was looking at, Speech to text, OCR and a gaze recorder that decribed what you were looking at.

This was all streamed to sqlite. The thing that was really "oh shit" what the thing that made the whole system usable: a 4 paragraph prompt that turned natural language into SQL and reported back to the (non technical user) what they wanted to know.

The most recent one is being caught out by Genai video of a gymnast. I worked in VFX so I am normally able to spot dodgy shit, but this one was close to being real, scarily real.

jasondigitized • yesterday at 9:20 PM

First time using Claude Code I was rather impressed by how quickly I was able to build out a website with Vue and Supabase. Cool. So.......I always wanted to create a iOS app but knew nothing about Objective C or Swift or XCode. "I wonder if Claude Code can build a iOS app for me?".

I went from 0-to-1 and shipped a podcast player into the AppStore in 2 weeks. Not a simulated app on XCode.....literally a fully approved app on the AppStore. Claude Code walked me through installing XCode all the way through to running a final audit on the app so I wouldn't get flagged during review. Mind blown.

richardfey • today at 4:01 AM

I could spot numerous bugs in code written recently and less recently, by me or colleagues. I was not angry but grateful and I knew there was no way back!

steren • yesterday at 7:47 PM

The moment when I ran llama on my old gaming PC (using something called ChatGPT4All) was my "oh shit" moment: I was now talking... to my PC.

brailsafe • yesterday at 9:58 PM

Not sure that I've had it yet, although hypothetically I'm sure it would probably be something similar to the examples of writing new software for old hardware mentioned ITT. The idea of resurrecting useful but unsupported gadgets that would otherwise become e-waste is something I've always found compelling.

Problem is, I just don't have enough old crap, and if I did, I would have a hard time justifying the expense, because that money could maybe just go toward a more intimate tinkering process.

For everything else, I either haven't had any sufficiently interesting ideas, or they ended up not being worth pursuing with those tools or at all.

When I do have success that I'm happy with and care about, it's a slow process that I ultimately need to know the details of anyway, but otherwise it's a bunch of luckily narrow work-related scenarios with well-documented constraints. Nothing's really been that shocking though.

The shocking thing to me is how unrewarding most of the successful tasks have been, partly because they often create unnecessary work and partly because the type of thinking required to massage or evaluate the result is much less stimulating, and there's much more of it in aggregate. It's fine if it's something like generating a UI from scratch because that hasn't produced dopamine in a long long time anyway

alt Hacker News

Ask HN: What was your "oh shit" moment with GenAI?

Comments

🔗 View 50 more comments