logoalt Hacker News

Let's be honest, Generative AI isn't going all that well

105 pointsby 7777777philyesterday at 6:37 PM118 commentsview on HN

Comments

gejoseyesterday at 11:23 PM

I believe Gary Marcus is quite well known for terrible AI predictions. He's not in any way an expert in the field. Some of his predictions from 2022 [1]

> In 2029, AI will not be able to watch a movie and tell you accurately what is going on (what I called the comprehension challenge in The New Yorker, in 2014). Who are the characters? What are their conflicts and motivations? etc.

> In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI.

> In 2029, AI will not be able to work as a competent cook in an arbitrary kitchen (extending Steve Wozniak’s cup of coffee benchmark).

> In 2029, AI will not be able to reliably construct bug-free code of more than 10,000 lines from natural language specification or by interactions with a non-expert user. [Gluing together code from existing libraries doesn’t count.]

> In 2029, AI will not be able to take arbitrary proofs from the mathematical literature written in natural language and convert them into a symbolic form suitable for symbolic verification.

Many of these have already been achieved, and it's only early 2026.

[1]https://garymarcus.substack.com/p/dear-elon-musk-here-are-fi...

show 6 replies
mattmaroonyesterday at 10:54 PM

Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.

I myself am saving a small fortune on design and photography and getting better results while doing it.

If this is not all that well I can’t wait until we get to mediocre!

show 8 replies
tombertyesterday at 11:02 PM

I find it a bit odd that people are acting like this stuff is an abject failure because it's not perfect yet.

Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.

Yes, people have probably been deploying it in spots where it's not quite ready but it's myopic to act like it's "not going all that well" when it's pretty clear that it actually is going pretty well, just that we need to work out the kinks. New technology is always buggy for awhile, and eventually it becomes boring.

show 5 replies
dreadswordyesterday at 11:10 PM

This feels like a pretty low effort post that plays heavily to superficial reader's cognitive biases.

I work commercializing AI in some very specific use cases where it extremely valuable. Where people are being lead astray is layering generalizations: general use cases (copilots) deployed across general populations and generally not doing very well. But that's PMF stuff, not a failure of the underlying tech.

show 2 replies
didibustoday at 1:11 AM

Ignoring the actual poor quality of this write-up, I think we don't know how well GenAI is going to be honest. I feel we've not been able to properly measure or assess it's actual impact yet.

Even as I use it, and I use it everyday, I can't really assess its true impact. Am I more productive or less overall? I'm not too sure. Do I do higher quality work or lower quality work overall? I'm not too sure.

All I know, it's pretty cool, and using it is super easy. I probably use it too much, in a way, that it actually slows things down sometimes, when I use it for trivial things for example.

At least when it comes to productivity/quality I feel we don't really know yet.

But there are definite cool use-cases for it, I mean, I can edit photos/videos in ways I simply could not before, or generate a logo for a birthday party, I couldn't do that before. I can make a tune that I like, even if it's not the best song in the world, but it can have the lyrics I want. I can have it extract whatever from a PDF. I can have it tell me what to watch out for in a gigantic lease agreement I would not have bothered reading otherwise.

I can have it fix my tests, or write my tests, not sure if it saves me time, but I hate doing that, so it definitely makes it more fun and I can kind of just watch videos at the same time, what I couldn't before. Coding quality of life improvements are there too, I want to generate a sample JSON out of a JSONSchema, and so on. If I want, I can write the a method using English prompts instead of the code itself, might not truly be faster or not, not sure, but sometimes it's less mentally taxing, depending on my mood, it can be more fun or less fun, etc.

All those are pretty awesome wins and a sign that for sure those things will remain and I will happily pay for them. So maybe it depends on what you expected.

show 1 reply
1a527dd5yesterday at 11:13 PM

A year ago I would have agreed wholeheartedly and I was a self confessed skeptic.

Then Gemini got good (around 2.5?), like I-turned-my-head good. I started to use it every week-ish, not to write code. But more like a tool (as you would a calculator).

More recently Opus 4.5 was released and now I'm using it every day to assist in code. It is regularly helping me take tasks that would have taken 6-12 hours down to 15-30 minutes with some minor prompting and hand holding.

I've not yet reached the point where I feel letting is loose and do the entire PR for me. But it's getting there.

show 2 replies
daedrdevyesterday at 6:44 PM

This post is literally just 4 screenshots of articles, not even its own commentary or discussion.

show 1 reply
saberienceyesterday at 11:41 PM

Gary Marcus (probably): "Hey this LLM isn't smarter than Einstein yet, it's not going all that well"

The goalposts keep getting pushed further and further every month. How many math and coding Olympiads and other benchmarks will LLMs need to dominate before people will actually admit that in some domains it's really quite good.

Sure, if you're a Nobel prize winner or PhD then LLMs aren't as good as you yet, but for 99% of the people in the world, LLMs are better than you at Math, Science, Coding, and every language probably except your native language, and it's probably better at you at that too...

m463today at 12:22 AM

I see stuff like this and think of these two things:

1) https://en.wikipedia.org/wiki/Gartner_hype_cycle

or

2) "First they ignore you, then they laugh at you, then they fight you, then you win."

or maybe originally:

"First they ignore you. Then they ridicule you. And then they attack you and want to burn you. And then they build monuments to you"

emp17344yesterday at 6:48 PM

Guessing this isn’t going to be popular here, but he’s right. AI has some use cases, but isn’t the world-changing paradigm shift it’s marketed as. It’s becoming clear the tech is ultimately just a tool, not a precursor to AGI.

show 3 replies
smashedyesterday at 11:01 PM

Should have used an LLM to proofread.. LLMs can still cannot be trusted?

show 1 reply
billsunshineyesterday at 11:14 PM

a historic moron. Marcus will make Krugman's internet==fax machine look like a good prediction

thechaoyesterday at 6:45 PM

You're absolutely right!

The irony of a five sentence article making giant claims isn't lost on me. Don't get me wrong: I'm amenable to the idea; but, y'know, my kids wrote longer essays in 4th grade.

herunanyesterday at 11:24 PM

First of all, popping in a few screenshots of articles and papers is not proper analysis.

Second of all, GenAI is going well or not depending on how we frame it.

In terms of saving time, money and effort when coding, writing, analysing, researching, etc. It’s extremely successful.

In terms of leading us to AGI… GenAI alone won’t reach that. Current ROI is plateauing, and we need to start investing more somewhere else.

sghiassyyesterday at 6:44 PM

LLMs help me read code 10x faster - I’ll take the win and say thanks

rpowersyesterday at 11:24 PM

I keep reading comments that claim GenAI's positive traits, but this usually amounts to some toy PoC that very eerily mirrors work found in code bootcamps. You want an app that has logins and comments and upvotes? GenAI is going to look amazing setting up a non-relational db to your node backend.

mrbluecoattoday at 12:01 AM

> LLMs can still cannot be trusted

But can they write grammatically correct statements?

mythrwyyesterday at 11:07 PM

It's going well for coding. I just knocked out a mapping project that would have been a week+ of work (with docs and stackoverflow opened in the background) in a few hours.

And yes, I do understand the code and what is happening and did have to make a couple of adjustments manually.

I don't know that reducing coding work justifies the current valuations, but I wouldn't say it's "not going all that well".

afspearyesterday at 11:25 PM

Meanwhile I'm over here reducing my ADO ticket time estimates by 75%.

amw-zeroyesterday at 11:22 PM

I’m starting to think this take is legitimately insane.

As said in the article, a conservative estimate is that Gen AI can currently do 2.5% of all jobs in the entire economy. A technology that is really only a couple of years old. This is supposed to be _disappointing_? That’s millions of jobs _today_, in a totally nascent form.

I mean I understand skepticism, I’m not exactly in love with AI myself, but the world has literally been transformed.

robertclausyesterday at 11:02 PM

Odds this was AI generated?

show 1 reply
anarticletoday at 12:51 AM

Download models you can find now and forever. The guardrails will only get worse, or models banned entirely. Whether it's because of "hurts people's health" or some other moral panic, it will kill this tech off.

gpt-oss isn't bad, but even models you cannot run are worth getting since you may be able to run them in the future.

I'm hedging against models being so nerfed they are useless. (This is unlikely, but drives are cheap and data is expensive.)

Jadiieeyesterday at 11:23 PM

It's more about how you use it. It should be a source of inspo. Not the end all be all.

bawolffyesterday at 11:17 PM

Holy moving goal posts batman!

I hate generative AI, but its inarguable what we have now would have been considered pure magic 5 years ago.

meowfaceyesterday at 11:20 PM

How on Earth do people keep taking Gary Marcus seriously?

show 3 replies
w4yaiyesterday at 11:10 PM

[flagged]

segfaultexyesterday at 11:16 PM

I wholeheartedly agree. Shitty companies steal art and then put out shitty products that shitty people use to spam us with slop.

The same goes for code as well.

I’ve explored Claude code/antigravity/etc, found them mostly useless, tried a more interactive approach with copilot/local models/ tried less interactive “agents”/etc. it’s largely all slop.

My coworkers who claim they’re shipping at warp speed using generative AI are almost categorically our worst developers by a mile.