logoalt Hacker News

the8472last Saturday at 11:39 PM1 replyview on HN

Does the training process ensure that all the intermediate steps remain interepretable, even on larger models? Not that we end up with some alien gibberish in all but the final step.


Replies

oofbeyyesterday at 3:39 AM

Training doesn’t encourage the intermediate steps to be interpretable. But they are still in the same token vocabulary space, so you could decode them. But they’ll probably be wrong.

show 1 reply