logoalt Hacker News

ibrahimhossaintoday at 5:06 PM0 repliesview on HN

Alignment feels like an arms race that favors whoever spends the most on RLHF and red teaming. If even friendly models keep leaking dangerous capabilities, the real moat might be making systems that are fundamentally limited rather than trying to patch every possible failure mode. Interesting piece.