logoalt Hacker News

djeastmtoday at 2:45 PM0 repliesview on HN

I thought reinforcement learning with human feedback was meant to get that quantification of "taste"