logoalt Hacker News

orbital-decayyesterday at 6:58 PM1 replyview on HN

The baked-in assumptions observation is basically the opposite of the impression I get after watching Gemini 3's CoT. With the maximum reasoning effort it's able to break out of the wrong route by rethinking the strategy. For example I gave it an onion address without the .onion part, and told it to figure out what this string means. All reasoning models including Gemini 2.5 and 3 assume it's a puzzle or a cipher (because they're trained on those) and start endlessly applying different algorithms to no avail. Gemini 3 Pro is the only model that can break the initial assumption after running out of ideas ("Wait, the user said it's just a string, what if it's NOT obfuscated"), and correctly identify the string as an onion address. My guess is they trained it on simulations to enforce the anti-jailbreaking commands injected by the Model Armor, as its CoT is incredibly paranoid at times. I could be wrong, of course.


Replies

jugyesterday at 10:05 PM

I've had some weird "thinking outside the box" behavior like this. I once asked 3 Pro what Ozzy Osbourne is up to. The CoT was a journey, I can tell you! It's not in its training data that he actually passed away. It did know he was planning a tour though. It had a real struggle trying to consolidate "suspicious search results" and even questioned whether it was fake news, or running against a simulation!, determining it wasn't going to fall for my "test".

It did ultimately decide Ozzy was alive. I pushed back on that, and it instantly corrected itself and partially blamed my query "what is he up to" for being formulated as if he was alive.

show 1 reply