logoalt Hacker News

numpad0last Saturday at 8:40 PM0 repliesview on HN

Even in tensor parallel modes? I thought it could only work if you're fine stalling all but n GPU for n users at any given moments.