Well, Deepseek batch sizes are something like 8192, so 128 isn't much.
https://arxiv.org/html/2412.19437v1 "the batch size per expert is relatively small (usually within 256 tokens)"