llm-consortium: prompts multiple models in parallel, loops until confidence_threshold, and iteratively refines a response.
This was inspired by a karpathy tweet [0] and the prototype created using another tool of mine: The LLM Plugin Generator plugin (essentially a curated collection of plugins for simonws llm cli as a few-shot prompt)
The llm-model-gateway companion plugin lets you serve models from the LLM cli as a an openai API. This allows you to use saved consortiums in your various clients as if they where a regular model. Bringing massive parallel reasoning to any workflow.
It occured to me at some time that an collection of parallel LLMs was not really a consortium. A consortium is a group of organizations. A group of groups. To rectify this I added for actual consortiums, where each member of an llm-consortium can itself be a consortium of models. e.g.
llm consortium save cns-glm-n3 -m glm-5.1 -n 3 --arbiter mercury-2
llm consortium save cns-k2-n3 -m kimi-k2.6:3 --arbiter mercury-2
llm consortium save cns-meta-glm-k2 -m cns-k2-n3 -m cns-glm-n3 --arbiter cns-k2-n3
Yes, even the arbiter/judge can be comprised of a consortium of models, bringing parallel reasoning to the task of judging parallel reasoning chains.
Consortiums can also now contain groups of specialists. These custom user-defined expert characters address the prompt from a different perspective. And a Westworld style Attribute matrix can be randomized to inject some more entropy into the process.
[0]https://xcancel.com/karpathy/status/1870692546969735361
Some other llm plugins I vibe coded:
classifai generates labels with approximate confidence derived from logprobs
llm-alias-options saves inference parameters such as reasoning effort with a model alias. (good for setting the provider in openrouter or creating a consortium of high temperature models)
llm-prompt-json adds a --json flag to return the llm logs object (good for getting conversion_id, or reasoning output in scripts)
llm-jina adds support for all jina AI specialised models and tools like web fetching, embedding and reranking.
Great project! I often check the opinion of one model against others when doing research and a sort of consensus process would save many a c/p
I'm quite curious about this.
I think this is similar. Unfinished. https://github.com/mattjoyce/roundtable-consensus