I am working on a system built around the OpenAI Responses API WebSocket mode as performance is something that interests me.
Its like a microservices architecture with NATS JetStream coordinating stuff. I want to keep the worker core as clean as possible, just managing open sockets, threads and continuation.
Document querying is something I am interested in also. This system allows me to pin a document to a socket as a subagent, which is then called upon.
I have hit alot of slip ups along the way, such as infinite loops trying to call OpenAI API, etc ...
Example usage: 10 documents on warm sockets on GPT 5.4 nano. Then the main thread can call out to those other sockets to query the documents in parallel. It allows alot of possibilities : cheaper models for cheaper tasks, input caching and lower latency.
There is also a frontend
Alot of information is in here, just thoughts, designs etc: https://github.com/SamSam12121212/ExplorerPRO/tree/main/docs