Are these kinds of libraries a temporary phenomenon? It strikes me as weird that providers haven't settled on a single API by now. Of course they aren't interested in making it easier for customers to switch away from them, but if a proprietary API was a critical part of your business plan, you probably weren't going to make it anyway.
(I'm asking only about the compatibility layer; the other tracking features would be useful even if there were only one cloud LLM API.)
The providers themselves can't keep this straight even within their own ecosystem. Plus everyone is running at a million miles/hour.
For example `Claude code` used to set 2 specific beta headers with some version numbers for their Max subscription to be supported.
Oauth tokens for Max plan is different from how their API keys looked. They kind of look similar, but has specific prefix that these tool pre-validate.
It is barely working at this point even within a single provider
It’s a complete mess, and the hardest part of this kind of tool is maintenance.
It’s not just about incompatible APIs, but also about how messages are structured. Even getting reliable tool calling requires a significant amount of work and testing for each individual model.
Just look at LiteLLM’s commit history and open issues/PRs. They’re still struggling with reliable multi-turn tool calling for Gemini, Kimi requires hardcoded rules (so K2.6 is currently unsupported because it’s not on the list), and so on.
Implementing the basic, generic OpenAI/Anthropic protocols is trivial, and at that point it almost feels like building an AI gateway is done. But it isn’t — that’s just the beginning of a long journey of constantly dealing with bugs, changes, and the quirks of each provider and model.
I've been maintaining an abstraction layer over multiple providers for a couple of years now - https://llm.datasette.io/
The best effort we have to defining a standard is OpenAI harmony/responses - https://developers.openai.com/cookbook/articles/openai-harmo... - but it's not seen much pickup. The older OpenAI Chat Completions thing is much more of an ad-hoc standard - almost every provider ends up serving up a clone of that, albeit with frustrating differences because there's no formal spec to work against.
The key problem is that providers are still inventing new stuff, so committing to a standard doesn't work for them because it may not cover the next set of features.
2025 was particularly turbulent because everyone was adding reasoning mechanisms to their APIs in subtly different shapes. Tool calls and response schemas (which are confusingly not always the same thing) have also had a lot of variance - some providers allow for multiple tool calls in the same response, for example.
My hunch is we'll need abstraction layers for quite a while longer, because the shape of these APIs is still too frothy to support a standard that everyone can get behind without restricting their options for future products too much.