logoalt Hacker News

didipyesterday at 6:22 PM0 repliesview on HN

I am super curious about tensor parallelism and the mechanisms behind how some models can activate only some of their attentions.