I was initially enthusiastic about DS3, because of the price, but eventually I learned the following things:
- function calling is broken (responding with excessive number of duplicated FC, halucinated names and parameters)
- response quality is poor (my use case is code generation)
- support is not responding
I will give a try to the reasoning model, but my expectations are low.
ps. the positive side of this is that apparently it removed some traffic from anthropic APIs, and latency for sonnet/haikku improved significantly.
The company has just over 100 employees, built V3 with $5.5M of compute and is quietly releasing tangible product without any hyperbolic PR in advance
They were fairly unknown until 26th Dec in west
I got some good code recommendations out of it. I usually give the same question to a few models and see what they say; they differ enough to be useful, and then I end up combining the different suggestions with my own to synthesize the best possible (by my personal metric, of course) code.
What are you using for structured output? Outlines, BAML, etc seem to vary a huge amount in quality. It was many moons ago, but outlines was unusable. BAML has been great.
I was looking to see how you're supposed to configure v3, then realized you're probably using the API, and came across this:
> The current version of the deepseek-chat model's Function Calling capabilitity is unstable, which may result in looped calls or empty responses. We are actively working on a fix, and it is expected to be resolved in the next version.
https://api-docs.deepseek.com/guides/function_calling
That's disappointing.
Maybe function calling using JSON blobs isn't even the optimal approach... I saw some stuff recently about having LLMs write Python code to execute what they want, and LLMs tend to be a lot better at Python without any additional function-calling training. Some of the functions exposed to the LLM can be calls into your own logic.
Some relevant links:
This shows how python-calling performance is supposedly better for a range of existing models than JSON-calling performance: https://huggingface.co/blog/andthattoo/dpab-a#initial-result...
A little post about the concept: https://huggingface.co/blog/andthattoo/dria-agent-a
Huggingface has their own "smolagents" library that includes "CodeAgent", which operates by the same principle of generating and executing Python code for the purposes of function calling: https://huggingface.co/docs/smolagents/en/guided_tour
smolagents can either use a local LLM or a remote LLM, and it can either run the code locally, or run the code on a remote code execution environment, so it seems fairly flexible.