logoalt Hacker News

petetntlast Saturday at 12:36 AM14 repliesview on HN

It’s impressive how every iteration tries to get further from pretending actual AGI would be anywhere close when we are basically writing library functions with the worst DSL known to man, markdown-with-english.


Replies

deraclast Saturday at 1:48 AM

Call me naive, but my read is the opposite. It's impressive to me that we have systems which can interpret plain english instructions with a progressively higher degree of reliability. Also, that such a simple mechanism for extending memory (if you believe it's an apt analogy) is possible. That seems closer to AGI to me, though maybe it is a stopgap to better generality/"intelligence" in the model.

I'm not sure English is a bad way to outline what the system should do. It has tradeoffs. I'm not sure library functions are a 1:1 analogy either. Or if they are, you might grant me that it's possible to write a few english sentences that would expand into a massive amount of code.

It's very difficult to measure progress on these models in a way that anyone can trust, moreso when you involve "agent" code around the model.

show 4 replies
johnfnlast Saturday at 1:26 AM

Literally yesterday we had a post about GPT-5.2, which jumped 30% on ARC-AGI 2, 100% on AIME without tools, and a bunch of other impressive stats. A layman's (mine) reading of those numbers feels like the models continue to improve as fast as they always have. Then today we have people saying every iteration is further from AGI. It really perplexes me is how split-brain HN is on this topic.

show 5 replies
kenjacksonlast Saturday at 1:24 AM

I think really more than anything it’s become clear that AGI is an illusion. There’s nothing there. It’s the mirage in the desert, you keep waking towards it but it’s always out of reach and unclear if it even exists.

So companies are really trying to deliver value. This is the right pivot. If you gave me an AGI with a 100 IQ, that seems pretty much worthless in today’s world. But domain expertise - that I’ll take.

show 1 reply
j45last Saturday at 1:55 AM

AGI as a binary 0 or 1 existing or not isn't the thing that interests me to look at primarily.

Is the technology continuing to be more applicable?

Is the way the technology is continuing to be more applicable leading to frameworks of usage that could lead to the next leap? :)

ETH_startlast Saturday at 3:13 AM

It's clear from the development trajectory that AGI is not what current AI development is leading to and I think that is a natural consequence of AGI not fitting the constraints imposed by business necessity. AGI would need to have levels of agency and self-motivation that are inconsistent with basic AI safety principles.

Instead, we're getting a clear division of labor where the most sensitive agentic behavior is reserved for humans and the AIs become a form of cognitive augmentation of the human agency. This was always the most likely outcome and the best we can hope for as it precludes dangerous types of AI from emerging.

ogogmadlast Saturday at 1:05 AM

Gemini seems to be firmly in the lead now. OpenAI doesn't seem to have the SoTA. This should have bearing on whether or not LLMs have peaked yet.

pavelstoevlast Saturday at 1:42 AM

Not wrong but markdown with English may be the most used DSL, second only to a language itself. Volume over quality.

sc077ylast Saturday at 10:58 AM

Who knew that English would be the most popular programming language of 2025?

DonHopkinslast Saturday at 5:00 AM

Markdown-with-English sounds like the ultimate domain nonspecific language to me.

skybrianlast Saturday at 1:21 AM

This might be actually be better in a certain way: if you change a real customer-facing API then customers will complain when you break their code. An LLM will likely adapt. So the interface is more flexible.

But perhaps an LLM could write an adapter that gets cached until something changes?

show 1 reply
baqlast Saturday at 10:06 AM

And yet the tools wielding these are quite adept at writing and modifying them themselves. It’s LLMs building skills for LLMs. The public ones will naturally be vacuumed up by scrapers and put in the training set, making all future LLMs know more.

Take off is here, human in the loop assisted for now… hopefully for much longer.

mrcwinnlast Saturday at 2:44 AM

I think you're missing the point.

nimchimpskylast Saturday at 2:53 AM

[dead]

cyanydeezlast Saturday at 1:04 AM

Yes. Prompt engineering is like a shittier verson of writing a VBA app inside Excel or Access.

Bloat has a new name and its AI integration. You thought Chrome using GB per tab was bad, wait until you need a whole datacenter to use your coding environment.

show 2 replies