GPT-5.6 Sol Is Here. Does the Scaling Law Story Still Hold Up?

Neon editorial cover reading Sol Released and Scaling Law Under Pressure, showing a glowing AI core, approval checkpoint, and constrained scaling flywheel.

OpenAI has officially dropped GPT-5.6.

This time, the product line is split into three tiers: Sol (flagship), Terra (cost-efficient capability model), and Luna (close to GPT-5.5 in performance but significantly cheaper to run). It is a clean segmentation — and a sign that OpenAI is thinking harder about margins and market positioning, not just raw performance.

As expected from yesterday’s leaks, GPT-5.6 launches as a limited preview for a small group of pre-approved partners. Under the Trump administration’s directive, simply paying for access is no longer enough — you have to be on the approved list. The government is reviewing customers one by one.

The benchmarks

On TerminalBench 2.1, GPT-5.6 Sol Ultra scores 91.9%, GPT-5.6 Sol hits 88.8%, and Claude Mythos 5 lands at 88.0%. The standard Sol only edges out Mythos by 0.8 percentage points; the Ultra variant — which uses deeper reasoning — pushes the gap to 3.9 points.

GPT-5.6 Sol Ultra leads TerminalBench 2.1 at 91.9%, followed by GPT-5.6 Sol at 88.8% and Claude Mythos 5 at 88.0%.

But here is the thing: most people will not be able to use these models. Benchmark scores matter less when the user base is artificially restricted. OpenAI seems to know this, too — roughly 90% of their official release announcement is about safety and responsible deployment, not performance.

Altman is playing a smarter game than Dario

The safety messaging in this release is worth noting for what it is not.

When Anthropic launched Project Glasswing and restricted Mythos to selected partners, the entire narrative was built around danger. Mythos could find thousands of high-severity vulnerabilities across major operating systems and browsers. Anthropic committed a $100M usage credit pool, $4M in open-source security donations, partnerships with 40+ organizations — all to hammer home one message: this thing is scary.

OpenAI took a different approach. Instead of leading with the threat, they emphasized the guardrails: Sol and Terra ship with activation classifiers for real-time interception. OpenAI used 700,000 A100-equivalent GPU hours searching for universal jailbreaks. The message is not “this model is dangerous” — it is “we have done the work to make it safe.”

This is why Altman is a better operator than Dario Amodei, at least on the commercial side.

Dario’s problem is that he keeps pushing the danger narrative to the front. It works in the short term — brand differentiation, credibility with safety researchers, trust signals for large enterprise deals. But the Fable 5 incident already proved the downside: if you spend years telling the world your models are weapon-grade, you cannot be surprised when regulators start treating them like weapons. And once that framing takes hold, you cannot walk it back.

Altman does the safety dance too. But he knows which lines, if crossed, close doors instead of opening them.

The enterprise reality check

This matters because the current market valuations of frontier AI companies cannot be supported by enterprise customers alone.

Yes, enterprise clients have money. But enterprise AI adoption is not like consumer subscriptions where you flip a switch and millions of users are running Mythos and Fable on self-serve.

The companies that will actually deploy these models at scale are the same handful of names: Nvidia, Amazon, Microsoft. Within these companies, access is not universal either. Not every team at Google gets to use Claude. Not every division at Amazon can spin up Fable or GPT-5.6 on demand.

The Goldman Sachs case is the most telling. Anthropic spent six months embedding engineers on-site to integrate Claude agents into Goldman’s accounting, compliance, and operational finance workflows. The cost of that go-to-market was enormous. And even after all that, usage is not open-ended — Goldman’s Hong Kong operations still cannot access Claude due to compliance restrictions.

For established workflows, the integration can justify itself. But when a department wants to experiment — run an ad-hoc research query, prototype a quick internal tool — the process is a different story. Budget requests, procurement approvals, OA system sign-offs, layer after layer. They need a clearly defined deliverable just to justify the API call. Even at a tech company, if you are on Amazon’s customer support team, the sales team, or the market analysis team, being an Amazon employee does not automatically mean you get access.

Large enterprises do not operate on impulse. You cannot just decide one morning to build an app or run a quick analysis and expect to have GPT-5.6 at your fingertips.

Yet under this policy framework, only large enterprises can get government clearance in the first place.

The Scaling Law story is fraying

Take Goldman Sachs again. Goldman is an American company, but its business is global. If only U.S.-based operations can use the most advanced models, does that actually improve Goldman’s global productivity? That is an open question — and not an easy one to answer optimistically.

When internal access requires navigating complex regulatory requirements and approval layers, the practical scope of model usage and API quotas will inevitably shrink. And when usage shrinks, the entire Scaling Law narrative — the idea that performance keeps improving predictably with more compute and data, and that this improvement justifies ever-larger investments — starts to lose coherence.

The Scaling Law story assumes a flywheel: better models attract more users, more users generate more revenue, more revenue funds more compute, more compute builds better models. Cut the user base, and the flywheel stalls.

This policy cannot last — but if it does, open-source wins

I do not think government-gated model access is sustainable. The commercial incentives against it are too strong, and the practical enforcement challenges are enormous.

But if it does persist, the biggest beneficiary will paradoxically be Chinese open-source models — DeepSeek, GLM, and others. Investment appetite for AI is massive and not going away. If the most capable American models are locked behind approval gates, capital and developer attention will naturally flow toward whichever models are more open and more accessible.

Yes, there are compute and technology gaps. But with enough capital and engineering talent flowing in that direction, those gaps can close faster than people expect.

What to watch in July

The key question is whether GPT-5.6 gets a broad public release in July. Altman has signaled he hopes to open access more widely “in a few weeks.” Whether that actually happens will not just affect individual developers — it will shape the direction of AI investment as a whole.

If the approval regime holds, the Scaling Law story needs a serious rewrite. If it opens up, we are back to the races. Either way, the signal from this release is unmistakable: frontier AI is no longer just a product category. It is becoming a regulated good — and that changes everything about how the market works.