logoalt Hacker News

minimaxiryesterday at 7:29 PM2 repliesview on HN

I'm not sure why being a MoE model would allow OpenAI to "turn up the good stuff". You can't just increase the number of E without training it as such.


Replies

SkyPuncheryesterday at 8:52 PM

My opinion is they're trying to internally route requests to cheaper experts when they think they can get away with it. I felt this was evident by the wild inconsistencies I'd experience using it for coding. Both in quality and latency

You "turn of the good stuff" by eliminating or reducing the likelihood of the cheap experts handling the request.

yberrebyyesterday at 8:49 PM

Based on what works elsewhere in deep learning, I see no reason why you couldn't train once with a randomized number of experts, then set that number during inference based on your desired compute-accuracy tradeoff. I would expect that this has been done in the literature already.