This is called mechanistic interpretability. There is lots of fascinating insights already since you...

zurfer • yesterday at 8:32 PM • 1 reply • view on HN

This is called mechanistic interpretability. There is lots of fascinating insights already since you can do basically everything down to the neuron or weight level thousands of times. The human brain is many orders of magnitude harder to make sense of.

Replies

sometimelurker • yesterday at 9:00 PM

well its actually called ablation, and its one way to do mech interp. anthriopics got a bunch of work on mech interp here https://transformer-circuits.pub/, like SAEs and NLAs

alt Hacker News

Replies