logoalt Hacker News

Previewing GPT‑5.6 Sol: a next-generation model

648 pointsby minimaxirtoday at 5:06 PM392 commentsview on HN

System card: https://deploymentsafety.openai.com/gpt-5-6-preview


Comments

dangtoday at 6:55 PM

All: for comments on the policy side please go to this related thread:

U.S. government will decide who gets to use GPT-5.6 - https://news.ycombinator.com/item?id=48690101

gandreanitoday at 6:10 PM

Easily the most interesting part of this announcement is buried in the second to last paragraph:

"We're also launching GPT‑5.6 Sol on Cerebras at up to 750 tokens per second in July, bringing frontier intelligence to customers at unprecedented speed. Access will initially be limited to select customers as we expand capacity."

750 tokens/s on a frontier model is going to be extremely interesting. I doubt this new version is anything but a version bump in terms of capabilities but if we can start getting these answers back faster, they end up being more useful.

Just off the top of my head, I can think of the tedious task of finding certain functionality within a codebase. I usually can't beat an AI agent harness at this task today. If the AI model is 3x faster I have less of chance.

show 11 replies
HyperL0gitoday at 5:19 PM

Here is a trend I'm noticing:

- GPT-5 mini costs $0.25/$2 and will be discontinued in December.

- GPT-5.4 mini costs $0.75/$4.5 and is supposed to be the replacement.

- GPT-5.4 nano costs $0.2/$1.25 and, while it ranks better in benchmarks than GPT-5 mini, it's not even close when you test it in real scenarios.

So you're left being forced to go to GPT 5.4 mini if you use 5 mini today.

The same thing is happening here as their “Luna“ model will cost $1/$6.

Can't we just stay with the models we actually want? I don't need GPT 5.4 mini. GPT-5 does the job.

Maybe it’s the realization that it was never that cheap in the first place and they're forcing us to upgrade in a slow and painful way.

show 18 replies
jdw64today at 5:23 PM

I think GPT writes code the best. How well will it write in version 5.6? It gives me chills.

Recently, I went head-to-head with GPT on nearly 2,000 lines of code, and GPT's solution was superior and faster. I even referenced multiple codebases on GitHub while trying, but they were incomparable to GPT.

So using GPT brings both fear and excitement.

The fear comes from realizing that this level of code is now the average for most people. The excitement comes from knowing that I can now study and learn at this level too.

I'm really looking forward to seeing how much more advanced the code will be with the upgrade to 5.6.

show 9 replies
jumploopstoday at 8:08 PM

If you used GPT-5.5 over the last 24 hours or so, you may have already had access to 5.6.

I've been running some tests on a harness we're building, and suddenly saw a jump in a few points yesterday. I reran the vanilla codex benchmark and saw an ~88% score on Terminal Bench 2.1 from GPT-5.5 on vanilla Codex.

The biggest indicator, beyond the score, was that 3 tests which frequently hit "safety" blockers with 5.5 started succeeding last night without warning.

show 3 replies
ComputerGurutoday at 7:58 PM

“ Terra has competitive performance to GPT‑5.5 [while being 2x cheaper]…”

To me that means “it’s an inferior product but marketing dictates we try and hide that.”

And “our most robust safety stack to date. We strengthened protections for higher-risk activity, sensitive cyber requests, and repeated misuse, and spent multiple weeks finding weaknesses, pressure-testing our system, and hardening it against real-world attacks” is of zero value to me at best, and most likely to my detriment (increasing refusals or nerfing utility). Why do providers keep leading with that? Are there customers (besides support ChatGPT chatbot users, maybe??) that ask for this?

show 2 replies
mohsen1today at 5:27 PM

> Additionally, we’re introducing a new `ultra` mode that goes beyond the capabilities of a single agent by leveraging subagents to accelerate complex work.

I'm curious about how does this work? Do the subagents also get to use the same tools? Will the client be flooded with tool calls? Why extra pricing for a new "model" when the same thing can happen in the client with more controls?

And if it's an army of subagents, why do they compare it to Fable and Mythos? Those models with similar harness would probably bench better I'm guessing

show 7 replies
sim04fultoday at 6:45 PM

"We're also launching GPT‑5.6 Sol on Cerebras at up to 750 tokens per second in July, bringing frontier intelligence to customers at unprecedented speed. Access will initially be limited to select customers as we expand capacity."

This seems like it would be the largest and first closed-source model Cerebras has offered till date

show 1 reply
OsrsNeedsf2Ptoday at 6:00 PM

Like Mythos before it, I'm simply not excited about a model I can't use

show 1 reply
anentropictoday at 6:55 PM

Previewing <minor version bump>: a next-generation model

supermdguytoday at 6:05 PM

> We're also launching GPT‑5.6 Sol on Cerebras at up to 750 tokens per second in July, bringing frontier intelligence to customers at unprecedented speed.

This is really exciting. I work on voice AI, and we're still using 4.1/4.1 mini since none of the frontier models come close on latency. I'm excited to be able to have more interactive experiences, I think it'll unlock new ways of working with these models.

casey2today at 9:40 PM

Sol, Terra, Luna? They are trolling (ragebaiting) with their naming now

scrlktoday at 6:20 PM

> Sol, Terra and Luna

So the next naming scheme might be FTX, Madoff and Enron? :^)

josefrichtertoday at 9:21 PM

Sol, Terra, Luna – crypto disaster vibes

seaaltoday at 7:12 PM

Did GPT-5.6 Sol Ultra decide the terrible colors for the benchmark graphs?

show 2 replies
m3htoday at 6:04 PM

If GPT-5.6 preview is not available outside US government approved "trusted partners", I don't see how the General Available can be trusted later.

Who knows what they will fix, block or change in the model between the preview and GA time. Open models can't arrive soon enough.

show 1 reply
Topfitoday at 7:40 PM

Is this a new pre training run independent of 5.5s or post trained on it with Cerebras support and a rebrand of Pro mode at more usable speeds as Sol? The latter seems more likely to me, especially as 5.5 scales very well across its modes so separate branding could make sense, but I don’t see any clear information either way.

firasdtoday at 5:39 PM

Some interesting stats here about the current landscape https://arena.ai/leaderboard/agent

Agent Arena (Dynamic ranking of models on how well they orchestrate tools for real-world agentic tasks, based on signals like tool reliability, task completion, and steerability.)

Top 10, Highest rank to lowest

Claude Fable 5 (High), Claude Opus 4.8 (Thinking), GPT 5.5 (xHigh), Claude Opus 4.7 (Thinking), GPT 5.5 (High), Claude Opus 4.7, Claude Opus 4.6, GPT 5.5, GPT 5.4 (High), GLM 5.2 (Max)

Text Arena View overall rankings across various AI models in text-to-text tasks across math, coding, creative writing, and other open-ended domains.

Top 10, Highest rank to lowest

claude-fable-5, claude-opus-4-6-thinking, claude-opus-4-7-thinking, claude-opus-4-6, claude-opus-4-7, muse-spark, gemini-3.1-pro-preview, gemini-3-pro, claude-opus-4-8-thinking, gpt-5.5-high

show 2 replies
5555watchtoday at 9:12 PM

Will it also have hardcoded self-lobotomy if asked about cutting edge ML or LLM solutions? (Looking at Fable here)

ChrisLTDtoday at 5:13 PM

If it's a new generation why isn't it GPT-6?

show 3 replies
mekprotoday at 5:21 PM

We need more coding benchmark score. Not sure that winning terminalbench 2.1 alone is a clear win over Fable/Mythos yet.

show 1 reply
NetOpWibbytoday at 7:24 PM

How are they able to compare with Fable when Fable was only available for three days?

show 1 reply
woeiruatoday at 5:37 PM

The choice of the name Sol is interesting for those Raised By Wolves fans out there… “Praise Sol!”

jimmydoetoday at 5:49 PM

Is there a list of Gov-approved companies?

If this is the new norm, we as workers should all start look for jobs in those companies.

ant-kinesthetictoday at 7:21 PM

How much dynamic routing do we think is being done here, especially in light of the cheaper options be 2x less cost than 5.5. I think learned routing is interesting because it could be the case that it only works as a way to get token and cost efficiency for in distribution tasks (like these benchmarks), yet on real world scenarios it could trend towards the same cost as the Sol cost.

vatsachaktoday at 5:32 PM

All of these LLMs are getting better at being at an LLM

But GPT-5.5 is as useful an LLM can be; it has solved lemmas I've thought about for a year, it can implement typed STLCs in Rust when I give it a formal grammar, it can help me analyze Postgres planner dumps.

It's great at tasks that have short solutions but

- they cannot learn based on a project

- their long term planning capabilities are worse than worms

- they are unconfident in decision making

- their internal representations are disgusting compared to JEPA

- they don't have any "system clock" like humans and computers do

- LLM architecture is not modular like computer architecture or human brain architecture

There's so many issues with LLMs. I wish that companies can start working on the next generation of architectures before the bubble pops

show 3 replies
sim04fultoday at 6:52 PM

Sol and 5.5 pro are in parity at $5 input / $30 output. What I'm inferring from this is that: - model weight size didn't change, and this is mostly a result of better model architecture and scaled up RL - better hardware utilization and and they're making better margins OR - worse hardware utilization and they're okay with digging into their margins.

show 2 replies
solfoxtoday at 9:27 PM

Love the name!

loufetoday at 5:13 PM

"Next generation model"

If it was the next generation, why isn't it a major version change..?

show 11 replies
corygarmstoday at 5:18 PM

I'll buy that its next generation if the svg bicycle pelican is carrying a baby

show 1 reply
dainiussetoday at 7:20 PM

I looked at the charts and it is clear that 88% from OpenAI is more than 88% from Anthropic.

bijowo1676today at 5:15 PM

Waiting for @simonw to report on this, before I read and try it

show 3 replies
leumontoday at 5:13 PM

> We plan to make them more broadly available to people using ChatGPT, Codex, and the API soon.

I hope this means then fable will also get released again.

show 1 reply
mccoybtoday at 5:15 PM

When will GPT-5.6 Protomolecule drop? Me and the boys on Eros can't wait to get our hands on it!

show 3 replies
bluepetertoday at 5:59 PM

I feel a bit like a Soviet hearing about Levi’s or the latest Springsteen release. C'mon!

rappatictoday at 5:42 PM

Seems like OpenAI has succumbed to the urge to give their models catchy names like Anthropic does

show 1 reply
swe_dimatoday at 6:11 PM

Pleasantly surprised that it costs as GPT 5.5, thank god for the competition.

smeethtoday at 5:43 PM

The sooner the USG figures out a standard process for approving releases the better. There are many differing opinions on how much to regulate AI, but I think we can all agree ad-hoc policy sucks.

duggantoday at 5:30 PM

> As part of our ongoing engagement with the U.S. government, we previewed our plans and the models’ capabilities ahead of today’s launch. At their request, we are starting with a limited preview for a small group of trusted partners whose participation has been shared with the government, before releasing more broadly.

The clowns in the US administration can barely remain coherent from one sentence to the next.

Having them be the gatekeepers of technological progress in 2026 is fucking lame.

show 1 reply
zkmontoday at 6:34 PM

It appears that between GLM-5.2 and GPT-5.6, anthropic is feeling the heat, atleast in the bang-for-the-buck heuristic?

show 1 reply
ponyoustoday at 6:50 PM

How can I become a trusted organization/partner? For my SaaS[0] where we generate 3D models using code it would be an absolute game changer to have such speedy generations. This would mean AI could do 10 iterations in the time it makes 1 now.

[0]: GrandpaCAD.com

low_tech_punktoday at 5:16 PM

all the emphasis on cyber security. feels like a reaction to anthropic, not a real next generation.

show 2 replies
nsingh2today at 5:15 PM

I'm really getting sick of reading about safeguards and what I'm not allowed to do on every model release.

show 1 reply
GodelNumberingtoday at 6:31 PM

I do not like the fact that this forces people to remember one more hierarchy of "Sol vs Terra vs Luna". OpenAI was supposed to simplify their naming since at least 2025.

show 1 reply
mikkelamtoday at 5:50 PM

Would love to see benchmarks on cognition's FrontierCode

🔗 View 39 more comments