logoalt Hacker News

dkdciolast Wednesday at 1:37 PM1 replyview on HN

> Have we not had numerous articles on HN about data exfiltration in recent memory?

there’s also an article on the front page of HN right now claiming LLMs are black boxes and we don’t know how they work, which is plainly false. this point is hardly evidence of anything and equivalent to “people are saying”


Replies

FeepingCreaturelast Wednesday at 4:52 PM

This is true though. While we know what they do on a mechanistic level, we cannot reliably analyze why the model outputs any particular answer in functional terms without a heroic effort at the "arxiv paper" level.

show 1 reply