> Making models larger improves overall accuracy but doesn't reliably reduce incoherence on hard problems.
Coherence requires 2 opposing forces to hold coherence in one dimension and at least 3 of them in higher dimensions of quality.
My team wrote up a paper titled "If You Want Coherence, Orchestrate a Team of Rivals"[1] because we kept finding that upping the reasoning threshold resulted in less coherence - more experimentation before we hit a dead-end to turn around.
So we had a better result from using Haiku (we fail over to Sonnet) over Opus and using a higher reasoning model to decompose tasks rather than perform each one of them.
Once a plan is made, the cheaper models do better as they do not double-think their approaches - they fail or they succeed, they are not as tenacious as the higher cost models.
We can escalate to higher authority and get out of that mess faster if we fail hard and early.
The knowledge of how exactly failure happened seems to be less useful to the higher reasoning model over the action biased models.
Splitting up the tactical and strategic sides of the problem, seems to work similarly to how Generals don't hold guns in a war.
This matches my intuition. Systematic misalignment seems like it could be prevented by somewhat simple rules like the hippocratic oath or Asimov's Laws of robotics or rather probabilistic bayesian versions of these rules that take into account error bounds and risk.
The probabilistic version of "Do No Harm" is "Do not take excessive risk of harm".
This should work as AIs become smarter because intelligence implies becoming better bayesians which implies being great at calibrating confidence intervals of their interpretations and their reasoning and basically gaining a superhuman ability for evaluating the bounds of ambiguity and risk.
Now this doesn't mean that AIs won't be misaligned, only that it should be possible to align them. Not every AI maker will necessarily bother to align them properly, especially in adversarial, military applications.
This is a good line: "It found that smarter entities are subjectively judged to behave less coherently"
I think this is twofold:
1. Advanced intelligence requires the ability to traverse between domain valleys in the cognitive manifold. Be it via temperature or some fancy tunneling technique, it's going to be higher error (less coherent) in the valleys of the manifold than naive gradient following to the local minima.
2. It's hard to "punch up" when evaluating intelligence. When someone is a certain amount smarter than you, distinguishing their plausible bullshit from their deep insights is really, really hard.
When humans dream, we are disconnected from the world around us. Without the grounding that comes from being connected to our bodies, anything can happen in a dream.
It is no surprise that models need grounding too, lest their outputs be no more useful than dreams.
It’s us engineers who give arms and legs to models, so they can navigate the world and succeed at their tasks.
I think It's not because AI working on "misaligned" goals. The user never specify the goal clearly enough for AI system to work.
However, I think producing detailed enough specification requires same or even larger amount of work than writing code. We write rough specification and clarify these during the process of coding. I think there are minimal effort required to produce these specification, AI will not help you speed up these effort.
It's nice seeing this with Sohl-Dickstein as the last author after reading this blog post from him some time ago: https://sohl-dickstein.github.io/2023/03/09/coherence.html
Longer thinking sections have more space for noise to accumulate?
My ignorant question: They did bias and variance noise, how about quantisation noise? I feel like sometimes agents are "flipfloping" between metastable divergent interpretations of the problem or solution.
The models they tested are already way behind the current state-of-the-art. Would be interesting to see if their results hold up when repeated with the latest frontier models.
For some reason the article reads to me like “AI is not evil, it just has accidents when it loses coherence.” Sounds a lot like liability shifting.
Oh, the irony of thinking this refers to the investors and shell companies.
I don’t know why it seems so hard for these guys to understand you scorecard every step for new strategy to Close distance at goal and if you have multiple generated forward options with no good weight you spawn a new agent and multiple paths. Then you score all the terminal branches and prune.
LLMs aren’t constrained to linear logic like your average human.
Yudkowsky btfo.
The comments so far seem focused on taking a cheap shot, but as somebody working on using AI to help people with hard, long-term tasks, it's a valuable piece of writing.
- It's short and to the point
- It's actionable in the short term (make sure the tasks per session aren't too difficult) and useful for researchers in the long term
- It's informative on how these models work, informed by some of the best in the business
- It gives us a specific vector to look at, clearly defined ("coherence", or, more fun, "hot mess")