logoalt Hacker News

fullstackwife01/20/20255 repliesview on HN

I was initially enthusiastic about DS3, because of the price, but eventually I learned the following things:

- function calling is broken (responding with excessive number of duplicated FC, halucinated names and parameters)

- response quality is poor (my use case is code generation)

- support is not responding

I will give a try to the reasoning model, but my expectations are low.

ps. the positive side of this is that apparently it removed some traffic from anthropic APIs, and latency for sonnet/haikku improved significantly.


Replies

coder54301/20/2025

Maybe function calling using JSON blobs isn't even the optimal approach... I saw some stuff recently about having LLMs write Python code to execute what they want, and LLMs tend to be a lot better at Python without any additional function-calling training. Some of the functions exposed to the LLM can be calls into your own logic.

Some relevant links:

This shows how python-calling performance is supposedly better for a range of existing models than JSON-calling performance: https://huggingface.co/blog/andthattoo/dpab-a#initial-result...

A little post about the concept: https://huggingface.co/blog/andthattoo/dria-agent-a

Huggingface has their own "smolagents" library that includes "CodeAgent", which operates by the same principle of generating and executing Python code for the purposes of function calling: https://huggingface.co/docs/smolagents/en/guided_tour

smolagents can either use a local LLM or a remote LLM, and it can either run the code locally, or run the code on a remote code execution environment, so it seems fairly flexible.

mtkd01/20/2025

The company has just over 100 employees, built V3 with $5.5M of compute and is quietly releasing tangible product without any hyperbolic PR in advance

They were fairly unknown until 26th Dec in west

show 1 reply
pmarreck01/20/2025

I got some good code recommendations out of it. I usually give the same question to a few models and see what they say; they differ enough to be useful, and then I end up combining the different suggestions with my own to synthesize the best possible (by my personal metric, of course) code.

rozap01/21/2025

What are you using for structured output? Outlines, BAML, etc seem to vary a huge amount in quality. It was many moons ago, but outlines was unusable. BAML has been great.

Gracana01/20/2025

I was looking to see how you're supposed to configure v3, then realized you're probably using the API, and came across this:

> The current version of the deepseek-chat model's Function Calling capabilitity is unstable, which may result in looped calls or empty responses. We are actively working on a fix, and it is expected to be resolved in the next version.

https://api-docs.deepseek.com/guides/function_calling

That's disappointing.