logoalt Hacker News

YetAnotherNicktoday at 12:10 PM0 repliesview on HN

Depends on if you are using tensor parallelism or pipeline parallelism, in the second case you don't need any sharing.