logoalt Hacker News

Ask HN: What was your "oh shit" moment with GenAI?

366 pointsby andrehackerlast Thursday at 11:42 PM681 commentsview on HN

Most of us were amused when DALL-E and its peers went mainstream, and we were quick to point out the obvious flaws.

Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much.

Using LLMs for coding initially was a only small step up from basic code completion, and a welcome farewell to Stack Overflow.

I am curious: what was the specific moment that you went from those quaint, dismissive observations to a slightly panicked, "Uh Oh" realization of what these models can do?


Comments

edfletcher_t137today at 3:33 AM

Agentic development. From "chat bot" to bonafide, capable developer. "Oh, shit!"

flyinglizardtoday at 12:11 AM

When the very first ChatGPT transformed a simple C "hello world" into Python. I knew it's special. I'm a very big supporter ever since, including some worried moments of pondering about what our future would look like and what's the meaning of a having a profession - especially software which defined my life from childhood - for my kids.

I'm now very good with LLMs as a user and at the system/product level but I understand it's not a simple story of replacing people. They're exponentially better than us at some things, and allow me to create things professionally which I couldn't do with an entire team of experts, but the bullshit compounds fast.

TuxPoweredyesterday at 9:27 PM

While debugging some issues in some system Claude refused to write test case because it broke terms of use.

Oh shit, all this fantastic technology is in hands of corporations and they get to decide what we’re allowed to use it for.

slopinthebagyesterday at 9:20 PM

Probably the one day I logged onto HN only to see 90% of the articles on the front page were AI slop. If I could press a button and make genai disappear I would...

bigyabailast Thursday at 11:56 PM

BERT, then GPT-J/GPT-Neo and FLAN-T5

bjourneyesterday at 10:08 PM

I told the bot I liked Steely Dan, Eagles, Bob Seger, and Roxette and asked it for music recommendations. It replied with Toto. Exasperated, I wrote "Oh, shit, you stupid bot, you don't know ANYTHING about music!"

forgetfreemantoday at 12:01 AM

For me the "oh shit" moment is when I realized that otherwise sane professionals, frequently in positions of authority, insist on taking these tools seriously. Zero thought put into any of the implications around unchecked anthropomorphism, security issues, employee knowledge retention, liability and other legal concerns, etc.

show 1 reply
moralestapiayesterday at 9:07 PM

>Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much.

No, ChatGPT was the "oh shit" moment for me.

Anyone who had touched a computer before that knows how big of a leap that was.

show 1 reply
deadbabeyesterday at 9:04 PM

I gave it an image of a complex maze and asked it to solve the maze. It returned the image with the shortest path drawn that not even I had found.

typerandomyesterday at 9:04 PM

-

show 1 reply
bigstrat2003yesterday at 9:01 PM

I haven't had one. It still sucks and doesn't provide value, due to the inherent inaccuracy that requires me to carefully check every little thing it does.

geuisyesterday at 8:51 PM

For me it wasn't "oh shit" per say, but "oh wow".

Some time in 2024 at a company get together, we had an afternoon hackathon. There was a feature in our iOS app that was missing (ability to mute autoplaying game trailers). This annoyed me a lot, because I frequently have music on when working and anytime I needed to open a test build it would kill my music. It had been an open ticket for a while but had low priority for the iOS team.

I had probably written a hundred lines of Swift in my career up to that point. Not expecting anything to come from it, I had Cursor examine the iOS codebase and told it I wanted to add a mute button under a certain area of the app settings.

Blew my mind when after only 10 minutes or so, the model had quickly found where to add the feature. Took a little back and forth, but then it added a fully functioning mute option in settings that mostly worked across the app. A little more back and forth, and those issues were settled. Maybe an hour overall of time spent that afternoon.

I pinged one of the iOS engineers about it later and he said to push it up for review. There were a few things that needed to be updated to get it inline with the rest of the codebase, but nothing substantial. Feature got merged a week or two later.

Now I'm way more productive than I have been in years. I've been getting a lot of enjoyment out of being able to prototype rapidly and experiment on features rather than getting bogged down in the process of scaffold work. Able to knock out issues much quicker.

That's all been positive, but it hasn't taken away my actual core responsibility. The LLMs can give you great advice and write code quickly. But they still don't always do well at broad thinking.

Current case in point: I've been working on an iOS app that uses vision models to do work on photos and videos that the user has taken. I've built text-based semantic search systems before, and there's a lot of cross over with vision models, but its been an interesting journey so far learning about the different types of vision models and what they're good at. Lots of testing so far and educating myself on the topic to get the user-level features I want. Claude code has been invaluable in this, as its great at writing the Swift code while I'm able to focus on the results of what is being done.

Where Claude is still not good is being able to reason at a higher level about different strategies on using vision model outputs to achieve the stated goals. Its not an issue of me not clearly defining the specifics of a feature and then letting Claude run off burning tokens to figure it out. For example, just late last night I was deep diving into some core segmentation code and having Claude explain what everything was doing line by line so that I could get a better understanding of the mechanics of the vision model.

A side effect was that I realized the vision model was outputting tons of nearly identical segments that were overlapping. This was something Claude had completely missed, and because I didn't know that's something this particular vision model did I had no prior way to know to catch it.

Bottom line is that understanding the mechanics of your application is still very much a requirement for the engineer. In this case, once I learned what was happening it completely changed my approach on how to achieve my feature goal. The code runs hundreds of times faster now and the segmentation is much, much better.

The new wave of coding models is disruptive, but its letting me be a much better engineer and get things done faster and with more assurance that the code being written is solid. I still have to spend the same amount of time thinking and learning about a problem, and probably more time verifying what's being output, but a lot of the drudgery is also being taken away.

kgwxdyesterday at 7:54 PM

When it started being forced on me in tools I was already using begrudgingly.

badgersnaketoday at 9:33 AM

I don’t know about “Oh shit”. I’ve had many “It’s shit” moments.

DavidSJyesterday at 9:45 PM

My oh shit moment was probably deep Q learning in 2013 (I guess that's not gen AI), but GPT-3 was pretty remarkable too.

CTDOCodebasestoday at 12:37 AM

When it translated a paragraph of one language into another flawlessly.

ulfwtoday at 8:58 AM

My moment was when absolute everything I put into Gemini, ChatGPT et al comes back with a super convincing sounding lie followed by 'Oh you are absolutely right for calling me out on this'.

It's a fucking joke and most people are blinded by it sounding very sophisticated and convincing

devmortoday at 12:59 AM

I still haven’t had it.

I’ve been working with ML for most of my career, and “gen ai” since the days of matrix crunching for NLP to a 10-element response array on my 1080Ti.

The current generation of AI is frankly, only marginally more impressive to me than that era. The only thing I’m saying “oh shit” to is the deranged amount of capital debt being leveraged to make it usable.

Watching companies spend billions of tokens per minute letting their dev teams that barely know how to write a prompt beyond some tips and tricks to gain a fluctuating slightly negative to slightly positive productivity change that no one can quantify is making me feel like one of the only sane people left in the world.

Quantization is the only interesting change I’ve seen in years.

nickhodgeyesterday at 11:00 PM

Asked AI to generate some code.

It looked absolutely unmaintainable and horrible.

"oh shit" there are serious developers using this crap? As an industry, we are so fsck'd

PunchyHamsteryesterday at 10:42 PM

The biggest "oh shit" one was that people are willing to believe LLM over humans and even humans that are in domain of the thing asked for.

The gullibility is terrifying

kingkawntoday at 8:35 AM

AI dungeon, a gpt2 product on iOS. Had almost no context, no memory, but could generate endless slop story. It was the first time I’d seen something like that, and the wild implications felt clear. I wasn’t aware at the time how immense the computational needs were to run the tech as it grew and the social implications, but just couldn’t believe that something like the MUDs I’d played in the late 80s early 90s could be autogenerated in a way now. It had no guardrails like now to prevent it from adopting a personality and so on, so it was in some ways more interesting than what the general public has now.

damnitbuildsyesterday at 12:07 AM

My "Oh shit" moment was when my boss got the bill for me trying to vibe code a bugfix.

fsnipertoday at 7:25 AM

The are lots of small "oh shit" moments for me. First interaction with an llm was already magical.

"This shit can emulate understand language, find a solution, answer it into words" .

Then came realisations it's not limited to single human languages, you can ask in one language and it could answer in another. It's also capable of understanding and generating code. Not only that, it's better than most humans for that. It can hear, it can see, it can paint, it can do music, it can sing.. It can combine, give a picture, ask for a music from that picture. Give a video, get software. It can mix and match.

After that came improvements, - no The revolutions - It started as a 4 year old with encyclopedic knowledge. It knew but could not convey, could not make sense sometimes. Was incorrect most of the time. Blubber. In a few years it matured to impeccable levels. It now can relate information with a lot of clarity, and it's less and less wrong. Nearly no hallucinations. It can do maths! Correct maths! Maths that I could not even my life depends on it. It's getting to a stage that it can proof where humans failed.

I am getting "oh shit moments" day by day.

boredhedgehogyesterday at 7:47 PM

"Translate this poem. Maintain meter and rhyme."

cdelsolartoday at 2:00 AM

I thought coding agents were probably BS and then I asked Cline to build me a test app to do something (I forgot what, something not that simple) and it built an entire working app. This was before Claude Code which was another step function improvement.

overgardyesterday at 7:47 PM

I feel like with the hype cycle and constant publishing of sketchy claims that I pretty much daily have an "oh shit" moment followed by a "nope, everything is about the same" moment. It's frankly exhausting. It's hard for me to recall a subject that has irritated me as much over a period of years, and it's barely even about AI itself but instead just feeling harassed with the constant anxiety and rage baiting.

show 2 replies
fragmedeyesterday at 11:17 PM

My original "oh shit" moment is lost but recently I was looking to support some hardware on Mac when it originally had Linux support. So codex-5.5 downloaded the Linux OS firmware that supported the device (it's afixed feature device, that runs a full Linux OS that also includes drivers for said device) which was buried inside that firmware. Codex then ran binwalk to extract the OS from the firmware, found the shell scripts that actuated the device, used those to "reason" about how the device worked, used that to start writing a Mac driver for it. It did that with very few prompts to get that far. I did still have to guide it with advanced directives after that in order to get to a working Mac driver, so I'm not totally replaceable just yet, but to go from the product name to it finding the Linux OS firmware, to the finding the actual firmware inside that OS download via binwalk, to then getting to a place where the Mac driver started to take shape, was very little advanced knowledge of how computers work.

yieldcrvyesterday at 11:02 PM

My oh shit moment lately has been realizing Gen AI is a distraction. language models are manipulating non-Gen AI media, agentic-ally

moving images around layers in photoshop, changing languages, exporting 1000s of variations for teams. Same with video compositing and editing

the human work that creatives thought they were insulated from as long as there was some backlash towards generative AI, and yet

Gen AI 2022 - 2025

jacheeyesterday at 9:29 PM

I haven’t had that yet.

I tried again this week, and CoPilot Plan Mode read the same 5-line markdown file 18 times over the course of 5 minutes of churning on a simple request, then provided zero value over what I posed in the request itself, and hallucinated things about my terraform repo that were just flat-out wrong.

As an Infrastructure/Cloud engineer, I’m far from worried about AI coming for my job.

show 1 reply
bluefirebrandyesterday at 9:24 PM

My "oh shit" moments come every time I see people glazing AI

"Oh shit. My skills I spent my life building are going to go to zero value. I'm going to have to dramatically change careers in my forties or I'm just going to wind up being a schmuck prompting these stupid fucking machines for the rest of my life"

Oh shit indeed

show 1 reply
noncomltoday at 4:30 AM

I am using codex and claude on a linux host connecting from a Widnows machine using ssh.

No matter what I tried I couldn't get "Shift+Enter" to work. I said fuck it, cloned kitty and alacritty and asked Claude to implement a terminal emulator for Windows that would render everything using DX12 and support modifyOtherKeys plus DA responses, and within a few days it was ready!

al_borlandyesterday at 10:54 PM

I won’t deny they are useful tools, but the hyperbole from the tech CEOs about them replacing all white collar workers in 12-18 months set the expectation so high that I’m still in the “fancy auto-complete” camp. It still feels nowhere close to replacing anyone, at least where I work. While useful, they haven’t been anywhere close to as useful as promised. Hallucinations and poor guidance are still a regular day-to-day issue that makes it impossible for me to trust agents with anything.

Had they been more realistic with the promises and didn’t frame it as replacing all of us within 2 years, I would have been more excited about the tech. Now that their claims are proving to be false and they’re trying to walk it back, it’s too late. The time for excitement has passed and it’s just something that exists.

The data center battles have also thrown a wet blanket on the tech, as they file lawsuits against towns near me to force construction to begin, despite the towns voting against it. The town can’t afford the fight, so the will of the people and the town gets bulldozed. It’s pretty gross to watch.

show 7 replies
rcptyesterday at 9:50 PM

"We're traveling to Tokyo on our way home from China. We'd like to plan a trip accessible by train that hits some beaches, some hot springs, and allows me to get the 4th does of a rabies vaccine sequence (the first three shots were rabvac)"

show 1 reply
varispeedyesterday at 9:12 PM

My oh shit moment was Opus 4.6 before it got nerfed.

It helped me refactor my old app. Something I always wanted to do, but didn't have time/mental capacity to do in a short space of time.

I wrote a short prompt, explaining how I want it to look like and which files it should go through. It asked me a few clarifications and then basically one shotted it.

Everything compiled and worked. Now my internal app is much much easier to extend and test.

I tried few more things like that and spent like £5k in the tokens in those two weeks.

Then it got nerfed and never worked like that again.

Now I don't use AI, because it is shite again. Even Opus 4.8.

saadn92yesterday at 7:41 PM

I use claude code on a daily basis, but honestly it becomes more annoying the more I use it. Why? I think because I ask it to do something and unless I'm extremely specific, either the code is verbose or the feature I'm designing is done in a poor way. For me, the productivity gains aren't that great and I'm even considering whether to go back to doing things by hand to save myself the frustration. Sure, if you don't care about code quality or scalability, it's a great thing to generate code. And yes, there are times when I don't, but for real projects, I actually do because I know as an engineer those things do matter in the long run. So, to be honest, I still haven't had that moment.

show 3 replies
3vo-aitoday at 8:06 AM

[dead]

keenseller709yesterday at 10:24 PM

[flagged]

andrewvu0203yesterday at 9:46 PM

[flagged]

mpodeleytoday at 1:04 AM

[dead]

Greenwoodsteve9today at 12:44 AM

[flagged]

JackeyLGenetoday at 10:22 AM

[flagged]

thatsayanfryesterday at 7:54 PM

[flagged]

aleksandre_devyesterday at 8:34 PM

[flagged]

4k0hzyesterday at 7:54 PM

[dead]

bewestphalyesterday at 9:08 PM

[dead]

field_readertoday at 5:50 AM

[flagged]

shining_rivertoday at 12:15 AM

[dead]

eddysirtoday at 2:33 AM

[flagged]

🔗 View 6 more comments