This article does not explain what happens if the multiple prompts need different experts. Does it try and schedule the maximum number experts into memory to try and run the maximum number of prompts at once? Scheduling gets very complicated and there are different trade offs around fairness of processing which prompts at which times.
This article does not explain what happens if the multiple prompts need different experts. Does it try and schedule the maximum number experts into memory to try and run the maximum number of prompts at once? Scheduling gets very complicated and there are different trade offs around fairness of processing which prompts at which times.