logoalt Hacker News

How does misalignment scale with model intelligence and task complexity?

107 pointsby salkahfitoday at 12:28 AM31 commentsview on HN

Comments

jmtullosstoday at 1:26 AM

The comments so far seem focused on taking a cheap shot, but as somebody working on using AI to help people with hard, long-term tasks, it's a valuable piece of writing.

- It's short and to the point

- It's actionable in the short term (make sure the tasks per session aren't too difficult) and useful for researchers in the long term

- It's informative on how these models work, informed by some of the best in the business

- It gives us a specific vector to look at, clearly defined ("coherence", or, more fun, "hot mess")

show 1 reply
gopalvtoday at 12:57 AM

> Making models larger improves overall accuracy but doesn't reliably reduce incoherence on hard problems.

Coherence requires 2 opposing forces to hold coherence in one dimension and at least 3 of them in higher dimensions of quality.

My team wrote up a paper titled "If You Want Coherence, Orchestrate a Team of Rivals"[1] because we kept finding that upping the reasoning threshold resulted in less coherence - more experimentation before we hit a dead-end to turn around.

So we had a better result from using Haiku (we fail over to Sonnet) over Opus and using a higher reasoning model to decompose tasks rather than perform each one of them.

Once a plan is made, the cheaper models do better as they do not double-think their approaches - they fail or they succeed, they are not as tenacious as the higher cost models.

We can escalate to higher authority and get out of that mess faster if we fail hard and early.

The knowledge of how exactly failure happened seems to be less useful to the higher reasoning model over the action biased models.

Splitting up the tactical and strategic sides of the problem, seems to work similarly to how Generals don't hold guns in a war.

[1] - https://arxiv.org/abs/2601.14351

show 1 reply
BenoitEssiambretoday at 3:36 AM

This matches my intuition. Systematic misalignment seems like it could be prevented by somewhat simple rules like the hippocratic oath or Asimov's Laws of robotics or rather probabilistic bayesian versions of these rules that take into account error bounds and risk.

The probabilistic version of "Do No Harm" is "Do not take excessive risk of harm".

This should work as AIs become smarter because intelligence implies becoming better bayesians which implies being great at calibrating confidence intervals of their interpretations and their reasoning and basically gaining a superhuman ability for evaluating the bounds of ambiguity and risk.

Now this doesn't mean that AIs won't be misaligned, only that it should be possible to align them. Not every AI maker will necessarily bother to align them properly, especially in adversarial, military applications.

CuriouslyCtoday at 12:56 AM

This is a good line: "It found that smarter entities are subjectively judged to behave less coherently"

I think this is twofold:

1. Advanced intelligence requires the ability to traverse between domain valleys in the cognitive manifold. Be it via temperature or some fancy tunneling technique, it's going to be higher error (less coherent) in the valleys of the manifold than naive gradient following to the local minima.

2. It's hard to "punch up" when evaluating intelligence. When someone is a certain amount smarter than you, distinguishing their plausible bullshit from their deep insights is really, really hard.

show 3 replies
cadamsdotcomtoday at 3:29 AM

When humans dream, we are disconnected from the world around us. Without the grounding that comes from being connected to our bodies, anything can happen in a dream.

It is no surprise that models need grounding too, lest their outputs be no more useful than dreams.

It’s us engineers who give arms and legs to models, so they can navigate the world and succeed at their tasks.

smy20011today at 1:16 AM

I think It's not because AI working on "misaligned" goals. The user never specify the goal clearly enough for AI system to work.

However, I think producing detailed enough specification requires same or even larger amount of work than writing code. We write rough specification and clarify these during the process of coding. I think there are minimal effort required to produce these specification, AI will not help you speed up these effort.

show 3 replies
leahtheelectrontoday at 3:18 AM

It's nice seeing this with Sohl-Dickstein as the last author after reading this blog post from him some time ago: https://sohl-dickstein.github.io/2023/03/09/coherence.html

tbrownawtoday at 3:09 AM

Longer thinking sections have more space for noise to accumulate?

hogehoge51today at 3:34 AM

My ignorant question: They did bias and variance noise, how about quantisation noise? I feel like sometimes agents are "flipfloping" between metastable divergent interpretations of the problem or solution.

nayrocladetoday at 1:28 AM

The models they tested are already way behind the current state-of-the-art. Would be interesting to see if their results hold up when repeated with the latest frontier models.

IgorPartolatoday at 1:05 AM

For some reason the article reads to me like “AI is not evil, it just has accidents when it loses coherence.” Sounds a lot like liability shifting.

show 1 reply
cyanydeeztoday at 12:57 AM

Oh, the irony of thinking this refers to the investors and shell companies.

tsunamifurytoday at 1:02 AM

I don’t know why it seems so hard for these guys to understand you scorecard every step for new strategy to Close distance at goal and if you have multiple generated forward options with no good weight you spawn a new agent and multiple paths. Then you score all the terminal branches and prune.

LLMs aren’t constrained to linear logic like your average human.

throwpoastertoday at 1:04 AM

Yudkowsky btfo.