> We also release an interactive frontend for exploring NLAs on several open models through a col...

NitpickLawyer • yesterday at 6:59 PM • 2 replies • view on HN

> We also release an interactive frontend for exploring NLAs on several open models through a collaboration with Neuronpedia.

Whatever they did on LLama didn't work, nothing makes sense in their example where they ask the model to lie about 1+1. Either the model is too old, or whatever they used isn't working, but whatever the autoencoder outputs is nothing like their examples with claude. Gemma is similarly bad.

Replies

fredericoluz • yesterday at 7:29 PM

it seems that the examples they showed off with haiku work. i'd guess llama is just too bad

fredericoluz • yesterday at 7:25 PM

same. i'm trying to trigger the 'mom is in the next room' russian thing but the model thinks the sentence is from american reddit.

➕ show 1 reply

alt Hacker News

Replies