This problem sounds like an excellent opportunity. We need a race to the bottom for hosting LLMs to democratize the tech and lower costs. I cheer on anyone who figures this out.
This is classic queuing theory, rate limits etc. I don't have an answer but I would look there.
This is classic queuing theory, rate limits etc. I don't have an answer but I would look there.