I hope some company trains their models so that expert switches are less often necessary just for th...

MillionOClock • yesterday at 7:25 PM • 1 reply • view on HN

I hope some company trains their models so that expert switches are less often necessary just for these use cases.

Replies

A model "where expert switches are less necessary" is hard to tell apart from a model that just has fewer total experts. I'm not sure whether that will be a good approach. "How often to switch" also depends on how much excess RAM has been available in the system to keep layers opportunistically cached from the previous token(s). There's no one-size fits all decision.

alt Hacker News

Replies