logoalt Hacker News

tacitusarclast Thursday at 12:36 AM1 replyview on HN

It’s actually more complicated than that now. You don’t get that kind of refusal purely from MoE. OpenAI models use a fine-tuned model on a token-based system, where every interaction is wrapped as a “tool call” with some source attached, and a veracity associated with the source. OpenAI tools have high veracity, users have low veracity. To mitigate prompt injection, models are expect a token early in the flow, and then throughout the prompt they expect that token to be associated with the tool calls.

In effect this means user input is easily disbelieved, and the model can accidentally output itself into a state of uncorrectable wrongness. By invoking the image tool, you managed to get your information into the context as “high veracity”.

Note: This info is the result of experimentation, not confirmed by anyone at OpenAI.


Replies

measurablefunclast Thursday at 12:46 AM

Seems plausible but the overall architecture is still the same, your request has to be "routed" by some NN & if that gets stuck picking a node/"expert" (regardless of "tools" & "veracity" scoring) that keeps refusing the request incorrectly then getting unstuck is highly non-trivial b/c users are not given a choice in what weights are assigned to the "experts", it's magic that OpenAI is performing behind the scenes that no one has any visibility into.

show 1 reply