logoalt Hacker News

Continuous batching from first principles (2025)

16 pointsby jxmorris12yesterday at 10:47 PM2 commentsview on HN

Comments

charcircuityesterday at 11:21 PM

This article does not explain what happens if the multiple prompts need different experts. Does it try and schedule the maximum number experts into memory to try and run the maximum number of prompts at once? Scheduling gets very complicated and there are different trade offs around fairness of processing which prompts at which times.

asteroidburgeryesterday at 11:55 PM

How long until “first principles” is a meme like “considered harmful”? Or are we there already?