logoalt Hacker News

NitpickLawyeryesterday at 6:59 PM2 repliesview on HN

> We also release an interactive frontend for exploring NLAs on several open models through a collaboration with Neuronpedia.

Whatever they did on LLama didn't work, nothing makes sense in their example where they ask the model to lie about 1+1. Either the model is too old, or whatever they used isn't working, but whatever the autoencoder outputs is nothing like their examples with claude. Gemma is similarly bad.


Replies

fredericoluzyesterday at 7:29 PM

it seems that the examples they showed off with haiku work. i'd guess llama is just too bad

fredericoluzyesterday at 7:25 PM

same. i'm trying to trigger the 'mom is in the next room' russian thing but the model thinks the sentence is from american reddit.

show 1 reply