Alignment feels like an arms race that favors whoever spends the most on RLHF and red teaming. If ev...

ibrahimhossain • today at 5:06 PM • 0 replies • view on HN

Alignment feels like an arms race that favors whoever spends the most on RLHF and red teaming. If even friendly models keep leaking dangerous capabilities, the real moat might be making systems that are fundamentally limited rather than trying to patch every possible failure mode. Interesting piece.

alt Hacker News